Closed divyegala closed 1 month ago
It is worth noting this hang can also be seen in a no change PR: https://github.com/rapidsai/cuml/pull/6047#issuecomment-2316020277
I cannot reproduce the issue anymore on an L40 machine with the latest commit hash. But, for those interested in reproducing it elsewhere, here are some commands :
docker run --gpus all --pull always --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -u 0 --entrypoint bash rapidsai/citestwheel:cuda12.5.1-ubuntu22.04-py3.10
export RAPIDS_BUILD_TYPE=nightly
export RAPIDS_REPOSITORY=rapidsai/cuml
export RAPIDS_REF_NAME=branch-24.10
export RAPIDS_SHA=d87b0ce
export RAPIDS_NIGHTLY_DATE=2024-09-05
git clone https://github.com/rapidsai/gha-tools.git
(git clone https://github.com/rapidsai/cuml.git && cd cuml && git checkout $RAPIDS_SHA)
mkdir -p ./dist
GHA_TOOLS_DIR=gha-tools/tools
RAPIDS_PY_CUDA_SUFFIX="$($GHA_TOOLS_DIR/rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" $GHA_TOOLS_DIR/rapids-download-wheels-from-s3 ./dist
python -m pip install $(echo ./dist/cuml*.whl)[test]
bash cuml/ci/run_cuml_dask_pytests.sh
Closed by #6051
First reported by @jakirkham in PR https://github.com/rapidsai/cuml/pull/6031, we are now seeing that
pytest dask-cuml
hangs in CUDA 12.5 wheel CI jobs. Until we figure out the root cause of this issue, we will be temporarily disabling that test suite.Reference to the hang: CI job link
cc @dantegd