Open bsipocz opened 6 months ago
Does this extension need to run in the science_demo conda environment?
I am running into the issue where dask-labextension is installed in the conda environment running jupyterlab (base in our case), but if you try to connect to a cluster from science_demo, it complains about discrepancy in the python version and other libraries. It seems to work fine. But this does not look like the right way to do it, and it will cause a problem ultimately. I will explore this more, but if there is someone who has looked at this before, I would appreciate suggestions.
@stargaser?
ping @nevencaplar
@zoghbi-a we plan to create a separate conda environment, so hopefully we can work out dependency issues there. I know I have created my own persistent conda env on the console before... Do you happen to know if I can create one to share between users, maybe by putting it in the shared /efs
folder?
I haven't used this extension myself before, but I'm trying to get one of the others who has to test it out as soon as we can. Thanks a ton!
As far as I can tell, it seems to be working when the cluster is started from the notebook or console. If the cluster is started from the UI, then if you try to connect to it from a conda environment other than base (what the UI uses by default), then you get a warning about python version mismatch between the client the server.
In any case, If creating a separate conda environment, I suggest using python=3.10, which is what we have for the base environment. If using a different version, make sure the cluster is started from the notebook/console and not from the UI. Once launched, it should show up in the UI.
As for sharing conda environments, I am not aware for a way to do it other than through an environment.yml
file, as described in the conda docs, which includes:
conda env export > environment.yml
(or create the file by hand following the standard).conda env create -f environment.yml
.Sharing the actual environment folder (e.g. in /efs) may work, but I can think of a few ways where it can go wrong. Someone can upgrade or remove a package and that affects everyone. Having it in a read-only location goes against the idea of users having control of their environments. If the above does not work, we can explore other options.
Thank you for you effort. I do see the dashboard in the dev enviroment and I am able to connect to it after starting the client, but I get the following error when trying to actually inspect my DASK client:
Thanks @nevencaplar. I am assuming the page on the right is the result of clicking that dashboard link (http://127.0.0.1:8787/stats)?
The link does not work because it is trying to access a local address (in your machine), which surprises me. I assumed dask-labextension should be clever enough to give back the correct url.
I don't see lots of documentation on how to configure it. I can look more, but you can access the correct link by selecting the panel you want from the left (orange) menu in the extension.
I have managed to solve it with the solutions described here, i.e., by typing in proxy/port
.
I am not sure what you meant by selecting the panel from the left menu. Those all did not work for me.
Yes. That is how ports get handled within jupyter. Usually that is worked out automatically under the hood by a jupyter lab extension, but dask-labextension does not seem to be doing it.
Hi team!
I've played with this a bit, and (with a few hacks) got it to work in a way that would be intuitive to the user. By "intuitive" what I mean is that the user can keep their notebook code ideally maximally independent of the environment in which it's running -- i.e., in a perfect world, I'd like to be able to take the same notebook from Rubin Science Platform and run it on Fornax w/o a single line of code change.
To achieve this, I think we /don't/ want a typical user to have explicit client = distributed.Client()
stanzas in their notebook. Instead, the environment should let the user spin up a client independently (ideally offering a sane default), and inject it into the Python interpreter as a global client. This is what dask-extension (when configured to do so) can to.
Here's how I got that to work:
conda deactivate
to activate the base environment. This is /very/ important, as the dask that will be installed in the next step must be the same as the dask the user notebook will run.pip install -r requirements.txt
with the requirements.txt
file in there. This will update dask to the version that LSDB needs.LSDB-quickstart-fornax.py
and make sure you're using the base
kernel. Then execute the demo. The notebook should automatically pick up the cluster, and you can open various progress tabs, etc that dask-extension offers.For the end-user, all of this should be pre-installed. So I think we'll have to:
With this, things should "just work" (tm).
Bugs: occasionally the magical "injection" of the client doesn't work -- I found that restarting the kernel gets it going. This is something we'll need to debug and fix (looks like an issue in dask_extension). To check if the cluster got injected correctly, run:
from distributed import default_client
default_client()
in your notebook (screenshot with output attached). Again, there's no need to (you mustn't) instantiate the Client
explicitly.
As far as I can see, the combination of dask/dask-labextension work together only when running within the same conda environment. Since the lab extension runs in the base environment, then the notebook needs to run in that environment too.
Following @mjuric suggestions, I updated the dev Astro image to install the lsdb notebook requirements in the base environment, and enable dask autostart in notebooks.
The workflow should be:
Abdu, this sounds like something that should be included in the user manual. Do you agree?
Yes. Once we get confirmation that it working as expected, I can add that to the docs.
@nevencaplar Can one of the LINCC folks check this?
@troyraen @zoghbi-a Confirming this worked like a charm (screenshot below).
User request:
Would it be possible to have: https://github.com/dask/dask-labextension, to enable easier working with Dask on the platform
cc @zoghbi-a