nv-legate / legate.core

The Foundation for All Legate Libraries
https://docs.nvidia.com/legate/24.06/
Apache License 2.0
186 stars 62 forks source link

The use of legate.cunumeric and legate.pandas in jupyter notebook #138

Open YarShev opened 2 years ago

YarShev commented 2 years ago

Hi guys, I wonder if it is possible to use legate.cunumeric and legate.pandas in jupyter notebook? If there are any docs on this, could you point me at those?

manopapad commented 2 years ago

Unfortunately Jupyter notebook support is not yet available. We require some extensions to Legion (the distributed runtime that Legate uses), to support the coordinator/worker model that Jupyter requires (where the coordinator is the local process handling the notebook, that sends work to a remote set of processes that are standing by). Work on this is tracked by https://github.com/StanfordLegion/legion/issues/801.

lightsighter commented 2 years ago

@pmccormick for visibility. Do you think we could get some of the work that LANL has done on building Jupyter for Legion moving towards open source?

YarShev commented 2 years ago

@manopapad, thanks! What is the progress on this? I see the changes of milestones only 😄

@lightsighter, yes, I think it would be great. Of course, if there aren't any limitations.

pmccormick commented 2 years ago

@lightsighter -- this is on my todo list for this week as folks are wrapping up other scope and I hope to get this ready to go. I should have an ETA for you later this week.

YarShev commented 2 years ago

@pmccormick, what is the status here?

pmccormick commented 2 years ago

Hi @YarShev. There is a PR for Jupyter support via Legion that is the first step for this. @lightsighter will have to provide an update about merging and any other details he would like to see updated/addressed before it lands (I know he has been super busy so I suspect this is somewhat delayed).

YarShev commented 2 years ago

@pmccormick, thanks for the update! Could you elaborate how this is supposed to work? Will MPI processes be launched in runtime when initializing legate, not when starting Jupyter?

lightsighter commented 2 years ago

I still need to do a review of the merge request for Jupyter notebook support. That merge request is only covering single node Jupyter support for Legion Python, and we'll probably keep that merge request restricted to single-node support just to get something working. Multi-node support will probably be done at a later date.

Could you elaborate how this is supposed to work? Will MPI processes be launched in runtime when initializing legate, not when starting Jupyter?

Ultimately, when we do get around to supporting multi-node Jupyter, we're going to want to start-up a multi-node Jupyter cluster (whether you use mpirun or another launcher to create the processes is mostly unimportant). A client Jupyter notebook will connect to one of those nodes as though it were a normal Jupyter server. Internally our implementation will do a (tree) broadcast of commands and data from the first node out to all the other nodes, so that they can all run the same cuNumeric commands (consistent with control replication in Legion). We'll probably want to do something fancy for sending data back to the Jupyter notebook, especially if the data getting sent back is large (TBD). That should all be transparent to Jupyter notebook users though. To them it will just look like a normal Jupyter interaction.

YarShev commented 2 years ago

I was wondering whether ipyparallel would somehow be used for that purpose. Apparently, all the work will be done under the hood in Legion or Legate itself, right?

lightsighter commented 2 years ago

Sorry for the very slow response on my part.

I was wondering whether ipyparallel would somehow be used for that purpose. Apparently, all the work will be done under the hood in Legion or Legate itself, right?

I'm not very familiar with ipyparallel, but I doubt it's going to be able to handle the distributed execution requirements that we're going to need for running on supercomputers or in the cloud. Not only would it need to be able to handle broadcasting of commands and data, it would need to do this efficiently to handle cases where we have potentially thousands of nodes (e.g. logarithmic communication latency in the number of nodes). If you think I'm just ignorant and ipyparallel can help us out with that, please let me know as I'd be happy to use an existing solution rather than reinventing the wheel.

LouisJenkinsCS commented 2 years ago

I have a question:

Is it possible to create a new BatchSpawner for JupyterHub, for example, that spawns a job with the requested resources?

https://github.com/jupyterhub/batchspawner https://jupyterhub.readthedocs.io/en/stable/reference/spawners.html

This way, you can inject code to allocate resources and launch Legate prior to starting the Jupyter Notebook? I haven't done enough research to know if this is possible or not, but was just wondering if this has been considered.

lightsighter commented 2 years ago

That's an interesting question! We weren't aware of it and hadn't considered it before. We'll definitely investigate that as a possibility. Do you happen to know if it works for general Jupyter notebooks outside the context of JupyterHub? I think we'll definitely want our solutions to work for stand-alone deployments as well.

LouisJenkinsCS commented 2 years ago

I'm not sure of the answer to those questions, unfortunately, as I've only found out about it recently myself. I hope it works out and is a fruitful direction to head in!