Open beckernick opened 1 year ago
The new Dask multi-GPU logistic regression should convert dtypes if needed rather than fail due to input data structures not adhering to the dtype expectations of the C++ implemenatation.
The exact example in the docstring works:
from dask_cuda import LocalCUDACluster from dask.distributed import Client import dask_cudf import cudf import numpy as np from cuml.dask.linear_model import LogisticRegression cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES="0,1") client = Client(cluster) X = cudf.DataFrame() X['col1'] = np.array([1,1,2,2], dtype = np.float64) X['col2'] = np.array([1,2,2,3], dtype = np.float64) y = cudf.Series(np.array([0.0, 0.0, 1.0, 1.0], dtype=np.float64)) X_ddf = dask_cudf.from_cudf(X, npartitions=2) y_ddf = dask_cudf.from_cudf(y, npartitions=2) reg = LogisticRegression() reg.fit(X_ddf, y_ddf)
As it does using all float32 dtypes:
from dask_cuda import LocalCUDACluster from dask.distributed import Client import dask_cudf import cudf import numpy as np from cuml.dask.linear_model import LogisticRegression cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES="0,1") client = Client(cluster) X = cudf.DataFrame() X['col1'] = np.array([1,1,2,2], dtype = np.float32) X['col2'] = np.array([1,2,2,3], dtype = np.float32) y = cudf.Series(np.array([0.0, 0.0, 1.0, 1.0], dtype=np.float32)) X_ddf = dask_cudf.from_cudf(X, npartitions=2) y_ddf = dask_cudf.from_cudf(y, npartitions=2) reg = LogisticRegression() reg.fit(X_ddf, y_ddf) <cuml.dask.linear_model.logistic_regression.LogisticRegression at 0x7f8d1e230d60>
But a mixing of float32 and float64 fails:
X = cudf.DataFrame() X['col1'] = np.array([1,1,2,2], dtype = np.float64) X['col2'] = np.array([1,2,2,3], dtype = np.float32) y = cudf.Series(np.array([0.0, 0.0, 1.0, 1.0], dtype=np.float32)) X_ddf = dask_cudf.from_cudf(X, npartitions=2) y_ddf = dask_cudf.from_cudf(y, npartitions=2) reg = LogisticRegression() reg.fit(X_ddf, y_ddf) 2023-08-11 15:37:06,339 - distributed.worker - WARNING - Compute Failed Key: _func_fit-e228a067-be45-42ca-bc34-02e7b2009e3a Function: _func_fit args: (LogisticRegressionMG(), [( col1 col2 2 2.0 2.0 3 2.0 3.0, 2 1.0 3 1.0 dtype: float32)], 4, 2, [(1, 2), (0, 2)], 0) kwargs: {} Exception: 'TypeError("Expected input to be of type in [dtype(\'float64\')] but got float32")' ...
As does mixing in an int32:
X = cudf.DataFrame() X['col1'] = np.array([1,1,2,2], dtype = np.int32) X['col2'] = np.array([1,2,2,3], dtype = np.int32) y = cudf.Series(np.array([0.0, 0.0, 1.0, 1.0], dtype=np.float32)) X_ddf = dask_cudf.from_cudf(X, npartitions=2) y_ddf = dask_cudf.from_cudf(y, npartitions=2) reg = LogisticRegression() reg.fit(X_ddf, y_ddf) 2023-08-11 15:39:30,280 - distributed.worker - WARNING - Compute Failed Key: _func_fit-2a49ecfa-7dd9-4a31-9170-dac07357d902 Function: _func_fit args: (LogisticRegressionMG(), [( col1 col2 2 2 2 3 2 3, 2 1.0 3 1.0 dtype: float32)], 4, 2, [(1, 2), (0, 2)], 0) kwargs: {} Exception: 'TypeError("Expected input to be of type in [dtype(\'float32\'), dtype(\'float64\')] but got int32")' ...
conda list | grep rapids # packages in environment at /home/nicholasb/miniconda3/envs/rapids-23.08: cudf_kafka 23.08.00a cuda11_py310_230811_g9d794877fd_216 rapidsai-nightly cusignal 23.08.00a py310_230811_ga644c53_8 rapidsai-nightly libcucim 23.08.00a cuda11_230811_g2ecf819_25 rapidsai-nightly libcudf 23.08.00a cuda11_230811_g9d794877fd_216 rapidsai-nightly libcudf_kafka 23.08.00a cuda11_230811_g9d794877fd_216 rapidsai-nightly libcugraph 23.08.00a cuda11_230811_g15f8bbaf_83 rapidsai-nightly libcugraph_etl 23.08.00a cuda11_230811_g15f8bbaf_83 rapidsai-nightly libcugraphops 23.08.00a cuda11_230811_g9c081845_21 rapidsai-nightly libcuml 23.08.00a cuda11_230811_g07176ea74_59 rapidsai-nightly libcumlprims 23.08.00a cuda11_230710_gd32fef7_2 rapidsai-nightly libcuspatial 23.08.00a cuda11_230811_gf105464c_65 rapidsai-nightly libkvikio 23.08.00a cuda11_230811_gc644fca_30 rapidsai-nightly libraft 23.08.00a cuda11_230811_g48625467_89 rapidsai-nightly libraft-headers 23.08.00a cuda11_230811_g48625467_89 rapidsai-nightly libraft-headers-only 23.08.00a cuda11_230811_g48625467_89 rapidsai-nightly librmm 23.08.00a cuda11_230811_g314b669a_27 rapidsai-nightly libxgboost 1.7.4 rapidsai_ha9c50b3_6 rapidsai-nightly py-xgboost 1.7.4 rapidsai_py310h1395376_6 rapidsai-nightly rapids 23.08.00a cuda11_py310_230803_g72f0ca7_35 rapidsai-nightly rapids-xgboost 23.08.00a cuda11_py310_230803_g72f0ca7_35 rapidsai-nightly ucx-proc 1.0.0 gpu rapidsai-nightly
+1
The new Dask multi-GPU logistic regression should convert dtypes if needed rather than fail due to input data structures not adhering to the dtype expectations of the C++ implemenatation.
The exact example in the docstring works:
As it does using all float32 dtypes:
But a mixing of float32 and float64 fails:
As does mixing in an int32: