Create startup script that installs RAPIDS and dask-databricks, then runs dask-databricks
Create a MNMG cluster that uses the 14.2 (Scala 2.12, Spark 3.5.0) runtime
Select *Use your own Docker container** and enter the image databricksruntime/gpu-tensorflow:cuda11.8 or databricksruntime/gpu-pytorch:cuda11.8.
The container images use CUDA 11.8 and there are no CUDA 12 images available from Databricks.
The single-node instructions don't use a custom container at all, so in theory we should be able to do the same with he multi-node instructions.
In practice if you omit the custom container the init scripts fails. The logs show that NVML can't be found during Dask startup. This makes me think that either the NVIDIA Driver or CUDA toolkit are not installed at the time the init script runs and are installed later.
We should find a way to start up dask-databricks without using a custom container and update the documentation.
Our current docs for multi-node Databricks cover the following process:
dask-databricks
, then runsdask-databricks
14.2 (Scala 2.12, Spark 3.5.0)
runtimedatabricksruntime/gpu-tensorflow:cuda11.8
ordatabricksruntime/gpu-pytorch:cuda11.8
.The container images use CUDA 11.8 and there are no CUDA 12 images available from Databricks.
The single-node instructions don't use a custom container at all, so in theory we should be able to do the same with he multi-node instructions.
In practice if you omit the custom container the init scripts fails. The logs show that NVML can't be found during Dask startup. This makes me think that either the NVIDIA Driver or CUDA toolkit are not installed at the time the init script runs and are installed later.
We should find a way to start up
dask-databricks
without using a custom container and update the documentation.