Closed skirui-source closed 9 months ago
cc @jacobtomlinson
The init.sh
script below successfully launches MNMG Databricks cluster with DASK scheduler and workers:
#!/bin/bash
set -e
echo "DB_IS_DRIVER = $DB_IS_DRIVER"
echo "DB_DRIVER_IP = $DB_DRIVER_IP"
pip install --upgrade pip dask[complete]
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
echo "This node is the Dask scheduler."
dask scheduler &
else
echo "This node is a Dask worker."
echo "Connecting to Dask scheduler at $DB_DRIVER_IP:8786"
# Wait for the scheduler to start
while ! nc -z $DB_DRIVER_IP 8786; do
echo "Scheduler not available yet. Waiting..."
sleep 1
done
dask worker tcp://$DB_DRIVER_IP:8786 &
fi
^^ Tested with 13.3 LTS ML (includes Apache Spark 3.4.1, GPU, Scala 2.12)
and g4dn.12xlarge
instance types (Driver, Worker nodes)
While trying to launch dask cuda worker (pip install dask-cuda==23.10.0
), we see the error below:
databricks fs cat dbfs:/dbfs/databricks/skirui/1024-060157-s1c0mgfg/init_scripts/1024-060157-s1c0mgfg_10_59_251_17/20231102_061645_00_dask_launch_init.sh.stderr.log
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ydata-profiling 4.2.0 requires numpy<1.24,>=1.16.0, but you have numpy 1.26.1 which is incompatible.
scipy 1.9.1 requires numpy<1.25.0,>=1.18.5, but you have numpy 1.26.1 which is incompatible.
mleap 0.20.0 requires scikit-learn<0.23.0,>=0.22.0, but you have scikit-learn 1.1.1 which is incompatible.
/databricks/python3/lib/python3.10/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
warnings.warn(
Usage: dask [OPTIONS] COMMAND [ARGS]...
Try 'dask -h' for help.
Error: No such command 'cuda'.
I pushed 1a555bd which fixes the build failure.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Fixes:https://github.com/rapidsai/deployment/issues/228