Open mrocklin opened 5 years ago
@beberg do you have any interest in doing this? You could adapt this notebook, which handles all of the dask-cudf things: https://gist.github.com/mrocklin/ab10c61a17391e8dbc7577f83fc4d25d
You would have to swap out LocalCUDACluster
for some other solution, either Helm or Dask-Kubernetes, and then increase the size of the dataframes.
@jacobtomlinson I know you've been doing a lot of deployment-related work. I believe this is already covered, at least partially. Could you check if there's something that's still worth covering or already planned?
This should all work today. It would be worthwhile to run through though.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
It would be useful for the RAPIDS effort to have a multi-node join computation deployed from Kubernetes. Until UCX arrives this will likely be slow, but we can probably work on deployment and configuration issues in the meantime.
I suspect that this involves the following steps:
I suspect that in going through this effort manually that we will expose a number of small issues that we'll then have to fix