jcrist / venv-pack

Package virtual environments for redistribution
https://jcrist.github.io/venv-pack/
BSD 3-Clause "New" or "Revised" License
44 stars 10 forks source link

Does this package work with spark local ? #24

Open AbdealiLoKo opened 8 months ago

AbdealiLoKo commented 8 months ago

I see documentation about spark on yarn.

Does this also work with spark local mode ? I sometimes use Spark Local for small jobs and I would rather keep my environments consistent with small or large jobs ...

Some documentation would be useful - if I try copying the same stuff from the yarn documentation - it does not seem to be picking up the venvpack environment

lek18 commented 6 months ago

I am able to get it working locally. But for yarn documentation i am not able to make it work. I tried:

  1. conda pack
  2. venv -pack with and without poetry
gcloud dataproc jobs submit pyspark "gs://hello_world.py" \
--project wmt-bfdms-dvhorizprod \
--cluster=ipi-cluster-prod \
--region=us-east4 \
--archives 'gs://env/environment.tar.gz#environment' \
--properties="spark.submit.deployMode=cluster,\
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python,\
spark.appMasterEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python"

and i am getting ./environment/bin/python not found

kbzowski commented 2 months ago

This is because of symlinks. In the archive you have symlink to local python executable. And probably on spark cluster it is located somewhere else ans symlink is invalid. You can change it with --python-prefix, but at the end it produces very strange path. I was not able to force it to point to correct one.