rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
290 stars 93 forks source link

Opinionated subclasses of other cluster managers #942

Open jacobtomlinson opened 2 years ago

jacobtomlinson commented 2 years ago

I wanted to open this issue for discussion after chatting to @mmccarty yesterday about some deployment documentation tasks.

In this package we have LocalCUDACluster which is an opinionated version of distributed.LocalCluster that has different default options and some custom logic to make launching GPU clusters simpler for end-users. Theoretically, you could use LocalCluster and achieve a GPU-capable cluster but with a lot of additional setup, config and logic.

Dask also has a helm chart for deploying a single cluster on Kubernetes, and we have a RAPIDS helm chart that is also effectively an opinionated subclass for deploying RAPIDS. However we have opted to archive the RAPIDS helm chart in favour of a documentation page that describes the config you need to run RAPIDS with the dask helm chart. This allows us to maintain one less thing and reduce complexity, especially given that nobody seems to be using the RAPIDS chart.

So the context here is that we have one opinionated RAPIDS subclass of a Dask thing that is valuable and useful, and another that is not and will be deprecated.

We also have many other cluster managers in the Dask ecosystem that are part of packages like dask-jobqueue, dask-cloudprovider, dask-kubernetes, etc. Some of these can be configured to run RAPIDS relatively easily, others cannot (but could be enhanced to do so). We are intending on improving the RAPIDS documentation to cover configuring and leveraging these.

There is a temptation to create RAPIDS specific subclasses of these here, for example we could have KubeCUDACluster that sets sensible defaults for Docker, RMM, UCX, etc. The alternative is to cover this in documentation only on pages like this one. The direction we are going with the helm chart suggests we shouldn't do this, but the value of LocalCUDACluster makes me think about it from time to time. The other thing to note is that the docs use dask-cuda-worker which contains much of the value of LocalCUDACluster.

@pentschev given you know LocalCUDACluster better than anyone so you have any thoughts or context on this that would be valuable? Do you see benefit in creating more cluster manager subclasses? Or should we steer away from that?

pentschev commented 2 years ago

Before going into the details of whether having more subclasses is valuable or not, let me give you a list of some of the reasons why Dask-CUDA has not been integrated in Distributed:

  1. Dask-CUDA CI requires GPU support, which was not a thing until recently in Distributed;
  2. Distributed still doesn't run GPU test in all PRs because any new contributors need to be approved, which is often forgotten/ignored;
  3. Dask-CUDA tests several GPU setups, which is not possible in Distributed due to the high traffic of PRs and limited resources;
  4. Dask-CUDA tests a significant amount of RAPIDS, which when breaking would also break Distributed CI.

Personally, I think having code available to users requiring fewer dependencies is a better option, but not at potential cost of quality, which would be the sacrifice we would need to make if we wanted to push Dask-CUDA into Distributed today. Hopefully we can resolve that in the future, but we are not there yet.

With the above said, subclassing seems like a reasonable alternative for projects that cannot be fully integrated into Distributed, be it for lack of CI resources or for being still being experimental/unstable, for example.

One idea I had in mind several months ago was to have some sort of pre-generated profiles. For example, assuming the user has all dependencies installed we could have a profile called "Distributed-Default" that would just run a LocalCluster with the current default options, and similarly another "Distributed-NoSpill" that would run LocalCluster but with spilling to memory/disk entirely disabled. We could then have a "CUDA-Default" with sensible defaults for GPUs, and another "CUDA-NoSpill" with the same defaults minus spilling. Such profiles would mean one could always run LocalCluster for Distributed or Dask-CUDA, without knowing anything about Dask-CUDA (except that it must be installed and import dask_cuda must succeed), that would also mean that LocalCluster would need to know of Dask-CUDA's existence and how to dispatch to LocalCUDACluster instead.

I haven't thought much about implementation details, but I think that could be doable. Similarly, we could have other specializations like KubeCUDACluster that the user wouldn't need to know much about, we would only then provide profiles in Distributed itself. It is also possible that KubaCUDACluster would then not be necessary, but we would instead just add "KubeCUDA-Default"/"KubeCUDA-NoSpill"/... profiles that provide specialized defaults to LocalCUDACluster, and if we need to introduce a few new arguments to extend LocalCUDACluster for Kubernetes, that would probably be ok too.

This would could all be extendable to dask-worker/dask-cuda-worker/dask-*-worker implementations too if we could provide something like --profile cuda-nospill.yml, for example.

What do you think of the suggestion above? That would of course require some coordination with Distributed, but seems doable.

I am not sure if I answered everything, so please let me know if you think there are gaps in my reply.

wence- commented 2 years ago

Apologies, this is not really a concrete proposal, but some slightly woolly thoughts

There is a temptation to create RAPIDS specific subclasses of these here, for example we could have KubeCUDACluster that sets sensible defaults for Docker, RMM, UCX, etc. The alternative is to cover this in documentation only on pages like this one. The direction we are going with the helm chart suggests we shouldn't do this, but the value of LocalCUDACluster makes me think about it from time to time. The other thing to note is that the docs use dask-cuda-worker which contains much of the value of LocalCUDACluster.

Looking at the code in these bits it seems that there is scope to make the CUDA-based configuration more principled in a way that would make the "adapt the documentation" simpler.

LocalCUDACluster and CUDAWorker (which is what dask-cuda-worker creates) both encode what are presumed best practices for setting up the environment. But do so in a monolithic way that does not allow some other cluster startup to use them. One sees this as well in the way that these two classes replicate a lot of the setup code in their respective __init__ methods.

LocalCUDACluster looks somewhat more principled to me (CUDAWorker subclasses distributed.core.Server which is not an abstract class, but doesn't call the superclass __init__ method, for example), but is still rather intimate with its parent class: the order of partial initialisation, superclass-init, modification of superclass properties and then finalising the initialisation is rather delicate.

Eventually, these classes basically just build standard dask/distributed objects with some extra plugins and some memory management configured. In the first instance refactoring these setup and configurations into objects one could create and use on their own might indicate how easy it would be to avoid subclassing

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.