While rb> operator has a solution for installing dependent packages with bundler as discussed in https://github.com/treasure-data/digdag/issues/318, it would be nice if we could have Python dependency installation for py> operator.
Of course, we can install dependency by running os.system("pip install pandas") or install on Docker image building, but it still messy to do so because we tend to lack version management, forgetting running pip install before import.
Here is an example of the syntax to achieve this proposal:
+task:
py>: my_script.smart_func
docker: MY_AWESOME_IMAGE
pre_execute: pip install -r requirements.txt -c constraints.txt # this can be poetory or pipenv or whatever
My primary use case is based on Docker executor, but if we want to run this local environment, creating temporary venv may be useful.
While rb> operator has a solution for installing dependent packages with bundler as discussed in https://github.com/treasure-data/digdag/issues/318, it would be nice if we could have Python dependency installation for py> operator.
Of course, we can install dependency by running
os.system("pip install pandas")
or install on Docker image building, but it still messy to do so because we tend to lack version management, forgetting runningpip install
beforeimport
.For example, Metaflow manages Python packages outside of tasks with
@conda
decorator for reproducibility: https://docs.metaflow.org/metaflow/dependenciesHere is an example of the syntax to achieve this proposal:
My primary use case is based on Docker executor, but if we want to run this local environment, creating temporary venv may be useful.