[Feature Request]: Execute code independently of the IDE

RHRolun commented 11 months ago

Feature description

The current workbenches execute user code inside the same environment that the IDE is running. This can in some cases be undesirable as the dependencies needed to run the environment may collide with the desired dependencies for the code execution, or provide results that differ from running on a slimmer environment in production. The goal of this feature would be to separate the IDE environment from the code execution environment so that the dependencies do not get mixed and so that the code execution environment easily can be replaced by another with other dependencies.

Describe alternatives you've considered

Separate the IDE environment from the code execution environment, either through virtual environments (set default kernel in the notebooks) or through remote execution on a different pod.

Anything else?

No response

shalberd commented 11 months ago

mmh, isn't that the idea behind pipelines and dedicated / separate runtime containers for every step of a pipeline?

@guimou I think you have experience in this, too, I noticed once how you talked about Jupyter env dependencies.

RHRolun commented 11 months ago

@shalberd - yes, pipelines let you do this in a nice way, but having to go through a pipeline all the time while prototyping is quite a hassle. This brings up another good point, if you wanted to develop a script for a specific step of the pipeline with specific dependencies, it would be great to quickly swap out your kernel/execution env and run it in the IDE without having to execute the pipeline.

andrewballantyne commented 11 months ago

cc @harshad16

lucferbux commented 10 months ago

/transfer kubeflow

guimou commented 10 months ago

This has always been the issue with the way our workbench images are built. Several aspects to that:

UBI images are built with an already existing Python venv (/opt/app-root). Everything Python that happens will be in this venv. The rational is that it won't prevent app/user packages to collide with the ones built inside the OS (there are some, for DNF and stuff. While the intent is good, it prevents from creating and using other venv.
Jupyter is a Python app. So it needs to run from somewhere... In local development mode, you will have as many Jupyter deployment as you have venvs. Meaning you switch to a specific venv (manually, with Anaconda, whatever...), then only launch Jupyter. There you can manage some consistency and compatibility. This is not doable in our containerized environment as Jupyter IS the UI.
If you modify currently loaded packages, then what happens of Jupyter? It's an egg-and-chicken problem that I have never fully investigated.
Working in a single fixed venv has advantages though, the first one being immutability/consistency. If you let people create multiple ones, you're back at square one in terms of being able to share notebooks and data as different people will have different venvs, more or less properly maintained or in sync.
Now, most if not all of our compatibility issues don't come from Jupyter itself, but from our extensions (Elyra, KFP,...), that are either awfully lagging in terms of dependencies, or have very strict fixed dependencies that prevent from installing something else alongside. I'm really close to simply ditch Elyra (or even Codeflare which had the same kinds of issues until recently) out of my custom images as it's a nightmare to have it work with some recent Python libraries...

Some possible paths from there:

Switch to VSCode for specific jobs... As it's not a Python-based UI you have less constraints in terms of compatibility, while still being able to work with notebooks. However you loose Elyra...
Investigate how to manually create and persist other kernels in a persistent volume. People would be able to create and populate those kernels with what they want, and Jupyter would execute them as "external" stuff, meaning not using the same Python installation as the one it's currently running on.
Have some kind of custom selector at the beginning of a sessions, so something before/in front of Jupyter that would allow to select a specific venv to run on. Somewhat similar to the Anaconda approach. However it's not that different from having different workbenches.
Update Elyra/KFP/Codeflare (and surely other extensions) to make sure they keep up with the rest of the world and don't yield to incompatibilities.
Don't include Elyra/KFP/Codeflare in all images, and have specific workbenches when you want to use those features. Definitely not ideal...

shalberd commented 10 months ago

Working in a single fixed venv has advantages though, the first one being immutability/consistency.

That is THE reason we in our corporation would always aim to have one container image mean one specific env, with clear dependencies and when developers want flexibility, we'd just build them another image, for which by the way @guimou had made a great modular folder structure and toolset (interactive-image-builder.sh) that makes the whole thing a breeze. With IDE, without IDE just for runtimes i.e. in Airflow or Kubeflow pipelines, and so on.

I have worked with Anaconda and other toolchains as well, so I know both perspectives, plus our data scientists used to working on their laptops locally gave us exactly that point of view initially mentioned here, but, there are clear advantages of doing it the immutable / always consistent per-container way.¨

I'm really close to simply ditch Elyra

Elyra is having issues with its community, I believe, plus trying to be too many things all at once, for all kinds of deployments, container PaaSs and so on.

opendatahub-io / kubeflow