Open jacobtomlinson opened 2 years ago
Before going into the details of whether having more subclasses is valuable or not, let me give you a list of some of the reasons why Dask-CUDA has not been integrated in Distributed:
Personally, I think having code available to users requiring fewer dependencies is a better option, but not at potential cost of quality, which would be the sacrifice we would need to make if we wanted to push Dask-CUDA into Distributed today. Hopefully we can resolve that in the future, but we are not there yet.
With the above said, subclassing seems like a reasonable alternative for projects that cannot be fully integrated into Distributed, be it for lack of CI resources or for being still being experimental/unstable, for example.
One idea I had in mind several months ago was to have some sort of pre-generated profiles. For example, assuming the user has all dependencies installed we could have a profile called "Distributed-Default" that would just run a LocalCluster
with the current default options, and similarly another "Distributed-NoSpill" that would run LocalCluster
but with spilling to memory/disk entirely disabled. We could then have a "CUDA-Default" with sensible defaults for GPUs, and another "CUDA-NoSpill" with the same defaults minus spilling. Such profiles would mean one could always run LocalCluster
for Distributed or Dask-CUDA, without knowing anything about Dask-CUDA (except that it must be installed and import dask_cuda
must succeed), that would also mean that LocalCluster
would need to know of Dask-CUDA's existence and how to dispatch to LocalCUDACluster
instead.
I haven't thought much about implementation details, but I think that could be doable. Similarly, we could have other specializations like KubeCUDACluster
that the user wouldn't need to know much about, we would only then provide profiles in Distributed itself. It is also possible that KubaCUDACluster
would then not be necessary, but we would instead just add "KubeCUDA-Default"/"KubeCUDA-NoSpill"/... profiles that provide specialized defaults to LocalCUDACluster
, and if we need to introduce a few new arguments to extend LocalCUDACluster
for Kubernetes, that would probably be ok too.
This would could all be extendable to dask-worker
/dask-cuda-worker
/dask-*-worker
implementations too if we could provide something like --profile cuda-nospill.yml
, for example.
What do you think of the suggestion above? That would of course require some coordination with Distributed, but seems doable.
I am not sure if I answered everything, so please let me know if you think there are gaps in my reply.
Apologies, this is not really a concrete proposal, but some slightly woolly thoughts
There is a temptation to create RAPIDS specific subclasses of these here, for example we could have
KubeCUDACluster
that sets sensible defaults for Docker, RMM, UCX, etc. The alternative is to cover this in documentation only on pages like this one. The direction we are going with the helm chart suggests we shouldn't do this, but the value ofLocalCUDACluster
makes me think about it from time to time. The other thing to note is that the docs usedask-cuda-worker
which contains much of the value ofLocalCUDACluster
.
Looking at the code in these bits it seems that there is scope to make the CUDA-based configuration more principled in a way that would make the "adapt the documentation" simpler.
LocalCUDACluster
and CUDAWorker
(which is what dask-cuda-worker
creates) both encode what are presumed best practices for setting up the environment. But do so in a monolithic way that does not allow some other cluster startup to use them. One sees this as well in the way that these two classes replicate a lot of the setup code in their respective __init__
methods.
LocalCUDACluster
looks somewhat more principled to me (CUDAWorker
subclasses distributed.core.Server
which is not an abstract class, but doesn't call the superclass __init__
method, for example), but is still rather intimate with its parent class: the order of partial initialisation, superclass-init, modification of superclass properties and then finalising the initialisation is rather delicate.
Eventually, these classes basically just build standard dask/distributed objects with some extra plugins and some memory management configured. In the first instance refactoring these setup and configurations into objects one could create and use on their own might indicate how easy it would be to avoid subclassing
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
I wanted to open this issue for discussion after chatting to @mmccarty yesterday about some deployment documentation tasks.
In this package we have
LocalCUDACluster
which is an opinionated version ofdistributed.LocalCluster
that has different default options and some custom logic to make launching GPU clusters simpler for end-users. Theoretically, you could useLocalCluster
and achieve a GPU-capable cluster but with a lot of additional setup, config and logic.Dask also has a helm chart for deploying a single cluster on Kubernetes, and we have a RAPIDS helm chart that is also effectively an opinionated subclass for deploying RAPIDS. However we have opted to archive the RAPIDS helm chart in favour of a documentation page that describes the config you need to run RAPIDS with the dask helm chart. This allows us to maintain one less thing and reduce complexity, especially given that nobody seems to be using the RAPIDS chart.
So the context here is that we have one opinionated RAPIDS subclass of a Dask thing that is valuable and useful, and another that is not and will be deprecated.
We also have many other cluster managers in the Dask ecosystem that are part of packages like
dask-jobqueue
,dask-cloudprovider
,dask-kubernetes
, etc. Some of these can be configured to run RAPIDS relatively easily, others cannot (but could be enhanced to do so). We are intending on improving the RAPIDS documentation to cover configuring and leveraging these.There is a temptation to create RAPIDS specific subclasses of these here, for example we could have
KubeCUDACluster
that sets sensible defaults for Docker, RMM, UCX, etc. The alternative is to cover this in documentation only on pages like this one. The direction we are going with the helm chart suggests we shouldn't do this, but the value ofLocalCUDACluster
makes me think about it from time to time. The other thing to note is that the docs usedask-cuda-worker
which contains much of the value ofLocalCUDACluster
.@pentschev given you know
LocalCUDACluster
better than anyone so you have any thoughts or context on this that would be valuable? Do you see benefit in creating more cluster manager subclasses? Or should we steer away from that?