rapidsai / deployment

RAPIDS Deployment Documentation
https://docs.rapids.ai/deployment/stable/
9 stars 28 forks source link

Document multi-node with Dask on Databricks #297

Closed skirui-source closed 9 months ago

skirui-source commented 10 months ago

Fixes:https://github.com/rapidsai/deployment/issues/228

skirui-source commented 10 months ago

cc @jacobtomlinson The init.sh script below successfully launches MNMG Databricks cluster with DASK scheduler and workers:


#!/bin/bash

set -e

echo "DB_IS_DRIVER = $DB_IS_DRIVER"
echo "DB_DRIVER_IP = $DB_DRIVER_IP"

pip install --upgrade pip dask[complete]

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  echo "This node is the Dask scheduler."
  dask scheduler &
else
  echo "This node is a Dask worker."
  echo "Connecting to Dask scheduler at $DB_DRIVER_IP:8786"
  # Wait for the scheduler to start 
  while ! nc -z $DB_DRIVER_IP 8786; do
    echo "Scheduler not available yet. Waiting..."
    sleep 1
  done
  dask worker tcp://$DB_DRIVER_IP:8786 &
fi

^^ Tested with 13.3 LTS ML (includes Apache Spark 3.4.1, GPU, Scala 2.12) and g4dn.12xlarge instance types (Driver, Worker nodes)


While trying to launch dask cuda worker (pip install dask-cuda==23.10.0), we see the error below:

databricks fs cat dbfs:/dbfs/databricks/skirui/1024-060157-s1c0mgfg/init_scripts/1024-060157-s1c0mgfg_10_59_251_17/20231102_061645_00_dask_launch_init.sh.stderr.log

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ydata-profiling 4.2.0 requires numpy<1.24,>=1.16.0, but you have numpy 1.26.1 which is incompatible.
scipy 1.9.1 requires numpy<1.25.0,>=1.18.5, but you have numpy 1.26.1 which is incompatible.
mleap 0.20.0 requires scikit-learn<0.23.0,>=0.22.0, but you have scikit-learn 1.1.1 which is incompatible.
/databricks/python3/lib/python3.10/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
  warnings.warn(
Usage: dask [OPTIONS] COMMAND [ARGS]...
Try 'dask -h' for help.

Error: No such command 'cuda'.
jacobtomlinson commented 9 months ago

I pushed 1a555bd which fixes the build failure.

review-notebook-app[bot] commented 9 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB