opendatahub-io / kubeflow

Machine Learning Toolkit for Kubernetes
Apache License 2.0
9 stars 31 forks source link

[Feature Request]: Execute code independently of the IDE #216

Open RHRolun opened 11 months ago

RHRolun commented 11 months ago

Feature description

The current workbenches execute user code inside the same environment that the IDE is running. This can in some cases be undesirable as the dependencies needed to run the environment may collide with the desired dependencies for the code execution, or provide results that differ from running on a slimmer environment in production. The goal of this feature would be to separate the IDE environment from the code execution environment so that the dependencies do not get mixed and so that the code execution environment easily can be replaced by another with other dependencies.

Describe alternatives you've considered

Separate the IDE environment from the code execution environment, either through virtual environments (set default kernel in the notebooks) or through remote execution on a different pod.

Anything else?

No response

shalberd commented 11 months ago

mmh, isn't that the idea behind pipelines and dedicated / separate runtime containers for every step of a pipeline?

@guimou I think you have experience in this, too, I noticed once how you talked about Jupyter env dependencies.

RHRolun commented 11 months ago

@shalberd - yes, pipelines let you do this in a nice way, but having to go through a pipeline all the time while prototyping is quite a hassle. This brings up another good point, if you wanted to develop a script for a specific step of the pipeline with specific dependencies, it would be great to quickly swap out your kernel/execution env and run it in the IDE without having to execute the pipeline.

andrewballantyne commented 11 months ago

cc @harshad16

lucferbux commented 10 months ago

/transfer kubeflow

guimou commented 10 months ago

This has always been the issue with the way our workbench images are built. Several aspects to that:

Some possible paths from there:

shalberd commented 10 months ago

Working in a single fixed venv has advantages though, the first one being immutability/consistency.

That is THE reason we in our corporation would always aim to have one container image mean one specific env, with clear dependencies and when developers want flexibility, we'd just build them another image, for which by the way @guimou had made a great modular folder structure and toolset (interactive-image-builder.sh) that makes the whole thing a breeze. With IDE, without IDE just for runtimes i.e. in Airflow or Kubeflow pipelines, and so on.

I have worked with Anaconda and other toolchains as well, so I know both perspectives, plus our data scientists used to working on their laptops locally gave us exactly that point of view initially mentioned here, but, there are clear advantages of doing it the immutable / always consistent per-container way.¨

I'm really close to simply ditch Elyra

Elyra is having issues with its community, I believe, plus trying to be too many things all at once, for all kinds of deployments, container PaaSs and so on.