treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.31k stars 222 forks source link

[feature-request] Support Python dependency installation for py operator #1350

Open chezou opened 4 years ago

chezou commented 4 years ago

While rb> operator has a solution for installing dependent packages with bundler as discussed in https://github.com/treasure-data/digdag/issues/318, it would be nice if we could have Python dependency installation for py> operator.

Of course, we can install dependency by running os.system("pip install pandas") or install on Docker image building, but it still messy to do so because we tend to lack version management, forgetting running pip install before import.

For example, Metaflow manages Python packages outside of tasks with @conda decorator for reproducibility: https://docs.metaflow.org/metaflow/dependencies

Here is an example of the syntax to achieve this proposal:

+task:
  py>: my_script.smart_func
  docker: MY_AWESOME_IMAGE
  pre_execute: pip install -r requirements.txt -c constraints.txt  # this can be poetory or pipenv or whatever

My primary use case is based on Docker executor, but if we want to run this local environment, creating temporary venv may be useful.

yoyama commented 4 years ago

It looks like be good idea. But pre_execute: is equivalent to sh> and may have risks.