INSTALL: dask-labextension

bsipocz commented 6 months ago

User request:

Would it be possible to have: https://github.com/dask/dask-labextension, to enable easier working with Dask on the platform

cc @zoghbi-a

zoghbi-a commented 6 months ago

Does this extension need to run in the science_demo conda environment?

I am running into the issue where dask-labextension is installed in the conda environment running jupyterlab (base in our case), but if you try to connect to a cluster from science_demo, it complains about discrepancy in the python version and other libraries. It seems to work fine. But this does not look like the right way to do it, and it will cause a problem ultimately. I will explore this more, but if there is someone who has looked at this before, I would appreciate suggestions.

@stargaser?

bsipocz commented 6 months ago

ping @nevencaplar

troyraen commented 6 months ago

@zoghbi-a we plan to create a separate conda environment, so hopefully we can work out dependency issues there. I know I have created my own persistent conda env on the console before... Do you happen to know if I can create one to share between users, maybe by putting it in the shared /efs folder?

I haven't used this extension myself before, but I'm trying to get one of the others who has to test it out as soon as we can. Thanks a ton!

zoghbi-a commented 6 months ago

As far as I can tell, it seems to be working when the cluster is started from the notebook or console. If the cluster is started from the UI, then if you try to connect to it from a conda environment other than base (what the UI uses by default), then you get a warning about python version mismatch between the client the server.

In any case, If creating a separate conda environment, I suggest using python=3.10, which is what we have for the base environment. If using a different version, make sure the cluster is started from the notebook/console and not from the UI. Once launched, it should show up in the UI.

As for sharing conda environments, I am not aware for a way to do it other than through an environment.yml file, as described in the conda docs, which includes:

Create your environment and install the required packages, then export it as a yml file: conda env export > environment.yml (or create the file by hand following the standard).
Other people create their environment with: conda env create -f environment.yml.

Sharing the actual environment folder (e.g. in /efs) may work, but I can think of a few ways where it can go wrong. Someone can upgrade or remove a package and that affects everyone. Having it in a read-only location goes against the idea of users having control of their environments. If the above does not work, we can explore other options.

nevencaplar commented 6 months ago

Thank you for you effort. I do see the dashboard in the dev enviroment and I am able to connect to it after starting the client, but I get the following error when trying to actually inspect my DASK client:

Opera Snapshot_2024-05-14_080421_daskhub fornaxdev mysmce com

zoghbi-a commented 6 months ago

Thanks @nevencaplar. I am assuming the page on the right is the result of clicking that dashboard link (http://127.0.0.1:8787/stats)?

The link does not work because it is trying to access a local address (in your machine), which surprises me. I assumed dask-labextension should be clever enough to give back the correct url.

I don't see lots of documentation on how to configure it. I can look more, but you can access the correct link by selecting the panel you want from the left (orange) menu in the extension.

nevencaplar commented 6 months ago

I have managed to solve it with the solutions described here, i.e., by typing in proxy/port. I am not sure what you meant by selecting the panel from the left menu. Those all did not work for me.

zoghbi-a commented 6 months ago

Yes. That is how ports get handled within jupyter. Usually that is worked out automatically under the hood by a jupyter lab extension, but dask-labextension does not seem to be doing it.

mjuric commented 5 months ago

Hi team!

I've played with this a bit, and (with a few hacks) got it to work in a way that would be intuitive to the user. By "intuitive" what I mean is that the user can keep their notebook code ideally maximally independent of the environment in which it's running -- i.e., in a perfect world, I'd like to be able to take the same notebook from Rubin Science Platform and run it on Fornax w/o a single line of code change.

To achieve this, I think we /don't/ want a typical user to have explicit client = distributed.Client() stanzas in their notebook. Instead, the environment should let the user spin up a client independently (ideally offering a sane default), and inject it into the Python interpreter as a global client. This is what dask-extension (when configured to do so) can to.

Here's how I got that to work:

Launch Fornax with the dev image.
Before starting /any/ Python kernels or dask, open a terminal window, and run conda deactivate to activate the base environment. This is /very/ important, as the dask that will be installed in the next step must be the same as the dask the user notebook will run.
Then clone https://github.com/lincc-frameworks/IVOA_2024_demo, and run pip install -r requirements.txt with the requirements.txt file in there. This will update dask to the version that LSDB needs.
After you've done that, click on 'Settings' menu, and check the 'Auto-Start Dask' item. We should configure dask-extension to do this by default. See screenshot below.
Then click on the "Dask' tab to the left, and hit the 'New +' button. This will create a cluster for you. Screenshot is below.
Now you can open LSDB-quickstart-fornax.py and make sure you're using the base kernel. Then execute the demo. The notebook should automatically pick up the cluster, and you can open various progress tabs, etc that dask-extension offers.

For the end-user, all of this should be pre-installed. So I think we'll have to:

Update the dask in the base environment to the same version that LSDB needs.
Set 'Auto-Start Dask' to defaul to True.
Make sure the notebooks use the base kernel as the default.
Think whether we want to pre-launch a small local dask cluster for every user.

With this, things should "just work" (tm).

Bugs: occasionally the magical "injection" of the client doesn't work -- I found that restarting the kernel gets it going. This is something we'll need to debug and fix (looks like an issue in dask_extension). To check if the cluster got injected correctly, run:

from distributed import default_client
default_client()

in your notebook (screenshot with output attached). Again, there's no need to (you mustn't) instantiate the Client explicitly.

zoghbi-a commented 5 months ago

As far as I can see, the combination of dask/dask-labextension work together only when running within the same conda environment. Since the lab extension runs in the base environment, then the notebook needs to run in that environment too.

Following @mjuric suggestions, I updated the dev Astro image to install the lsdb notebook requirements in the base environment, and enable dask autostart in notebooks.

The workflow should be:

Start the cluster from lab extension.
Open and the notbeook and restart the kernel of the notebook if already running.
Run the notebook and use the dashboard to track the progress.

vandesai1 commented 5 months ago

Abdu, this sounds like something that should be included in the user manual. Do you agree?

zoghbi-a commented 5 months ago

Yes. Once we get confirmation that it working as expected, I can add that to the docs.

troyraen commented 5 months ago

@nevencaplar Can one of the LINCC folks check this?

mjuric commented 5 months ago

@troyraen @zoghbi-a Confirming this worked like a charm (screenshot below).

nasa-fornax / fornax-images

INSTALL: dask-labextension #6