[ENH] - Ensure that default Dask Gateway environment matches active kernel environment

nebari-dev / nebari

🪴 Nebari - your open source data science platform

https://nebari.dev

BSD 3-Clause "New" or "Revised" License

279 stars 89 forks source link

[ENH] - Ensure that default Dask Gateway environment matches active kernel environment #1294

Open dharhas opened 2 years ago

dharhas commented 2 years ago

Feature description

Currently, the dask gateway cluster option defaults to the first environment available rather than the environment actually being used by the notebook. If the environment doesn't have dask in it then the next stage just hangs. It is really easy not to notice that the environment being used by dask-gateway is the wrong environment when running through all the cells.

I propose that we ensure the the default conda environment for dask be the one being actively being used by the jupyter kernel since that is the most sensible default.

In the example below we see that the filesystem/dashboard env is the default, even though the notebook is running filesystem/dask

from dask_gateway import Gateway
gateway = Gateway()

options = gateway.cluster_options()
options

Value and/or benefit

Makes using Dask-Gateway less error prone and improves usability.

Anything else?

No response

costrouc commented 2 years ago

This will definitely need investigation. Googling breifly I don't see a staightforward way to get the jupyter kernel name without javascript.

dharhas commented 2 years ago

This also gets into the reproducibility angle and dashboarding. i.e. knowing the kernal being used and putting it in the notebook metadata can help with reproduction and also with picking a good default environment for dashboard sharing.

viniciusdc commented 2 years ago

There seems to be a default config yaml that can be loaded in here https://gateway.dask.org/configuration-user.html#default-configuration which has the luster options in it -- we might be able to set the env programmatically in there I think... but don't know how that interferes with the gateway.cluster_options()

viniciusdc commented 2 years ago

c.c @costrouc I think this might help as well https://docs.dask.org/en/latest/deploying-kubernetes-helm.html?highlight=conda%20environemt#matching-the-user-environment

viniciusdc commented 2 years ago

We can set the filesystem/dask env as default, which can be overwritten easily using the cluster options GUI. The only issue is that cant automatically detect the active environment with this... unless we do something during the deployment (aka bash with conda active env variable) to dynamically update this file .config/dask/gateway.yaml

edit* It seems to be possible using $CONDA_DEFAULT_ENV

As we are using dask_gateway to perform this and the dask permission system from Keycloak we should be okay with the default env containing dask during this "inspection"

viniciusdc commented 2 years ago

HI @Chris Ostrouchov about the Gateway default option for cluster env, what do you think of using the above approach?

dcmcand commented 7 months ago

cc @viniciusdc for visibility

viniciusdc commented 5 months ago

I forgot about this, I will open a PR as this is now easier to achieve using conda-store endpoints