rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.25k stars 534 forks source link

[QST] Encountering raft::cuda_error with cuML's RandomForestClassifier on GPU. (cp.cuda.Device(2).use()) #5983

Open m946107011 opened 3 months ago

m946107011 commented 3 months ago

Hi,

I am encountering an issue when selecting a GPU using cp.cuda.Device(2).use(). When I do not specify the GPU device, the script runs without errors.

Description:

I am using RAPIDS 24.06, CUDA 12.4, and Python 3.9. I encounter a raft::cuda_error when using cuml's RandomForestClassifier with GPU devices.

Code:

import cupy as cp import os import cudf import cuml import pandas as pd from sklearn import model_selection from cuml import datasets import dask from dask.distributed import Client, wait from dask_cuda import LocalCUDACluster from dask.utils import parse_bytes from numba import cuda import dask_cudf from cuml.ensemble import RandomForestClassifier as cuRFC from cuml import ForestInference import joblib from tqdm import tqdm from scipy import stats from sklearn import metrics import pickle from collections import Counter import random import shutil import time import gc import warnings import numpy as np import multiprocessing

cp.cuda.Device(2).use() model_parameter = cuRFC(n_estimators=500, max_features='log2', random_state=seed)

Error Message:

CURFC /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/api_decorators.py:344: UserWarning: For reproducible results in Random Forest Classifier or for almost reproducible results in Random Forest Regressor, n_streams=1 is recommended. If n_streams is > 1, results may vary due to stream/thread timing differences, even when random_state is set return func(*kwargs) terminate called after throwing an instance of 'raft::cuda_error' what(): CUDA error encountered at: file=/opt/conda/conda-bld/work/cpp/src/decisiontree/batched-levelalgo/builder.cuh line=331: call='cudaMemsetAsync(done_count, 0, sizeof(int) max_batch * n_col_blks, builder_stream)', Reason=cudaErrorInvalidValue:invalid argument Obtained 7 stack frames

1 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) +0x5a [0x767aa52af28a]

2 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::assignWorkspace(char, char) +0x308 [0x767aa5dc13e8]

3 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so: ML::DT::Builder<ML::DT::GiniObjectiveFunction<float, int, int> >::Builder(raft::handle_t const&, CUstream_st, int, unsigned long, ML::DT::DecisionTreeParams const&, float const, int const, int, int, rmm::device_uvector, int, ML::DT::Quantiles<float, int> const&) +0x2fc [0x767aa5dc19cc]

4 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/cuml/internals/../../../../libcuml++.so(+0xdf021f) [0x767aa5df021f]

5 in /home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/site-packages/sklearn/utils/../../../../libgomp.so.1(+0x18f09) [0x767abecbbf09]

6 in /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x767ed8294ac3]

7 in /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x767ed8326850]

/home1/rhlin/anaconda3/envs/rapids-24.06/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 24 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Aborted (core dumped)

Any suggestions for resolving this issue?

Thank you so much.

RH

dantegd commented 3 months ago

Thanks for the issue @m946107011, interestingly enough I have not used CuPy's cuda device selection mechanisms with RAPIDS in general, and it is untested particularly with cuML. I would recommend instead using the environment variable CUDA_VISIBLE_DEVICES. Are you planning to use multugpu capabilities? Asking since I saw the multiple dask imports in the code you shared.