Closed jarmak-nv closed 4 months ago
Does cuml
set 1.5
as a minimum version?
In init.sh
in our docs we have
pip install --extra-index-url=https://pypi.nvidia.com \
"cudf-cu11" \
"cuml-cu11" \
"dask-cudf-cu11" \
"dask-cuda=={{rapids_version}}"
I would assume installing cuml
would bump scikit-learn
. Is that not the case?
Oh interesting - you're right!
scikit-learn
isn't a hard-dependency of cuML, but it breaks on import now. Looks like this is actually a cuML issue.
cuML now has a PR to remove the hard dependency for 24.06.
DataBricks has 1.0.2 installed on live, and 1.3 on the beta container. cuML won't trigger an update on its own, so to ensure DB users get a good experience I think we should do an upgrade as part of init.sh
.
That being said, maybe my initial plan of an --upgrade
is worse than a pin to the same as in cuML ie: pip install scikit-learn==1.5
Ok thanks for confirming. So just to check, you are proposing we add something like the following to our docs
pip install --extra-index-url=https://pypi.nvidia.com \
"cudf-cu11" \
"cuml-cu11" \
"dask-cudf-cu11" \
"dask-cuda=={{rapids_version}}" \
"scikit-learn==1.5"
Yup! I figured this is the best place to do it since we already provide the init.sh
and while technically users might have no problems on Databricks with the old version of scikit-learn, it's safest to upgrade it to prevent potential issues with cuML.
@jarmak-nv @jacobtomlinson @aravenel this issue also affects colab. Thanks for sharing Ben!
The fix in cuml
means this change should no longer be needed.
cuML now uses sklearn 1.5 with the merge of https://github.com/rapidsai/cuml/pull/5851 which causes databricks to fail since their containers use at newest version 1.3.
We will need to update the docs to add
to init.sh
Otherwise users will see an error similar to below: