rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
292 stars 93 forks source link

Build proof of concept of multi-node join computation on Kubernetes #7

Open mrocklin opened 5 years ago

mrocklin commented 5 years ago

It would be useful for the RAPIDS effort to have a multi-node join computation deployed from Kubernetes. Until UCX arrives this will likely be slow, but we can probably work on deployment and configuration issues in the meantime.

I suspect that this involves the following steps:

  1. Obtain access to a Kubernetes cluster with GPUs
  2. Use either dask-kubernetes or the Dask helm chart to deploy Dask workers onto that cluster, doing whatever is necessary to specify GPUs in the pod specification
  3. Run a computation similar to https://blog.dask.org/2019/01/29/cudf-joins , but presumably larger in scale
  4. Quantify the computational costs, possibly using the profile and task_stream diagnostic utilities from the client to capture information

I suspect that in going through this effort manually that we will expose a number of small issues that we'll then have to fix

mrocklin commented 5 years ago

@beberg do you have any interest in doing this? You could adapt this notebook, which handles all of the dask-cudf things: https://gist.github.com/mrocklin/ab10c61a17391e8dbc7577f83fc4d25d

You would have to swap out LocalCUDACluster for some other solution, either Helm or Dask-Kubernetes, and then increase the size of the dataframes.

pentschev commented 3 years ago

@jacobtomlinson I know you've been doing a lot of deployment-related work. I believe this is already covered, at least partially. Could you check if there's something that's still worth covering or already planned?

jacobtomlinson commented 3 years ago

This should all work today. It would be worthwhile to run through though.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.