Open betatim opened 2 months ago
CC @galipremsagar
Maybe the first thing to do is to add a additional workflow/CI run that is optional (it failing doesn't block merging) that runs pytest -p cudf.pandas --ignore=dask -m "not memleak"
(seems to be the main way tests are run?). That way we can get "live feedback" on the progress and see that the test a PR says it fixes does indeed pass now.
Once all tests pass we can then make it a required workflow so that we don't regress.
FAILED test_input_utils.py::test_convert_matrix_order_cuml_array[K-C-pandas-float32] - AssertionError: assert 'F' == 'C'
FAILED test_input_utils.py::test_convert_matrix_order_cuml_array[K-C-pandas-float64] - AssertionError: assert 'F' == 'C'
FAILED test_input_utils.py::test_convert_matrix_order_cuml_array[K-F-pandas-float32] - AssertionError: assert 'F' == 'C'
FAILED test_input_utils.py::test_convert_matrix_order_cuml_array[K-F-pandas-float64] - AssertionError: assert 'F' == 'C'
are fixed in PR #5882
This issue is the result of running the cuml tests with
pytest -p cudf.pandas --ignore=dask -m "not memleak"
. The goal is to find out how well cuml works withcudf.pandas
activated. As far as I can tell this actually means at least two things: (1) cudf's "pandas compatibility mode" is activated and (2) the cudf.pandas accelerator is doing its magic. We could look at (1) separately, for example users of cuml might turn on this option even without using the pandas accelerator.The goal of this issue is to have a central list of all failing tests so we can coordinate working on getting them fixed. I think it would make sense to open one PR per test file or even one PR for a set of tests in one file. That way the diff stays manageable for the reviewer and we don't have too many PRs. Post in this issue if you are working on one of these so we can avoid duplicate effort.
The failures in
test_one_hot_encoder.py
might be a bit harder to solve than the other ones, so I singled them out.I used 6d3bb0d43e4499210512b6941d1c08f7611ea960 and a conda environment setup today.
The tests in
test_one_hot_encoder.py
cause trouble withcudf.pandas
activated. Several of them lead toMemoryError
orCUDARuntimeError
s. Example output below.Full list of failed tests in `test_one_hot_encoder.py`
``` ==================================================================================== short test summary info ==================================================================================== FAILED test_one_hot_encoder.py::test_onehot_inverse_transform[cupy-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_inverse_transform[cupy-drop2] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_inverse_transform[cudf-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_inverse_transform[cudf-drop2] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-10-sparse-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-10-dense-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-1000-sparse-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-1000-dense-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-20000-sparse-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cupy-20000-dense-first] - AttributeError: 'ndarray' object has no attribute 'values_host' FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-10-sparse-None] - RuntimeError: CUDA error at: /opt/conda/conda-bld/work/cpp/build/_deps/cuco-src/include/cuco/detail/static_map.inl111: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-10-sparse-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-10-dense-None] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-10-dense-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-1000-sparse-None] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-1000-sparse-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-1000-dense-None] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-1000-dense-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-20000-sparse-None] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-20000-sparse-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-20000-dense-None] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_random_inputs[cudf-20000-dense-first] - MemoryError: std::bad_alloc: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_drop_idx_first[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_idx_first[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_one_of_each[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_one_of_each[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cupy-drop0-`drop` should have as many columns] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cupy-drop1-Trying to drop multiple values] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cupy-drop2-Some categories [0-9a-zA-Z, ]* were not found] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cupy-drop3-Wrong input for parameter `drop`.] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cudf-drop0-`drop` should have as many columns] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cudf-drop1-Trying to drop multiple values] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cudf-drop2-Some categories [0-9a-zA-Z, ]* were not found] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_drop_exceptions[cudf-drop3-Wrong input for parameter `drop`.] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_get_categories[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_get_categories[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_sparse_drop[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_sparse_drop[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_categories_shape_mismatch[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_categories_shape_mismatch[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_category_specific_cases - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_category_class_count[uint8] - cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_category_class_count[uint16] - cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered FAILED test_one_hot_encoder.py::test_onehot_get_feature_names[cupy] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp FAILED test_one_hot_encoder.py::test_onehot_get_feature_names[cudf] - MemoryError: std::bad_alloc: CUDA error at: /nvme/1/thead/miniconda/envs/cuml-dev-24.06/include/rmm/mr/device/cuda_memory_resource.hpp ```example test with `CUDARuntimeError`
``` ____________________________________________________________________________ test_onehot_category_class_count[uint8] ____________________________________________________________________________ total_classes = 255 @pytest.mark.parametrize( "total_classes", [np.iinfo(np.uint8).max, np.iinfo(np.uint16).max], ids=["uint8", "uint16"], ) def test_onehot_category_class_count(total_classes: int): # See this for reasoning: https://github.com/rapidsai/cuml/issues/2690 # All tests use sparse=True to avoid memory errors encoder = OneHotEncoder(handle_unknown="ignore", sparse=True) # ==== 2 Rows ==== example_df = DataFrame() > example_df["high_cardinality_column"] = cp.linspace( 0, total_classes - 1, total_classes ) test_one_hot_encoder.py:328: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cupy/_creation/ranges.py:161: in linspace return _linspace_scalar(start, stop, num, endpoint, retstep, dtype) /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cupy/_creation/ranges.py:91: in _linspace_scalar ret = cupy.empty((num,), dtype=dt) /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cupy/_creation/basic.py:31: in empty return cupy.ndarray(shape, dtype, order=order) cupy/_core/core.pyx:132: in cupy._core.core.ndarray.__new__ ??? cupy/_core/core.pyx:220: in cupy._core.core._ndarray_base._init ??? cupy/cuda/memory.pyx:738: in cupy.cuda.memory.alloc ??? cupy/cuda/memory.pyx:633: in cupy.cuda.memory._malloc ??? cupy/cuda/memory.pyx:634: in cupy.cuda.memory._malloc ??? cupy/cuda/memory.pyx:101: in cupy.cuda.memory.Memory.__init__ ??? cupy_backends/cuda/api/runtime.pyx:498: in cupy_backends.cuda.api.runtime.malloc ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered cupy_backends/cuda/api/runtime.pyx:146: CUDARuntimeError ```example test with `MemoryError`
``` ____________________________________________________________ test_onehot_drop_exceptions[cudf-drop1-Trying to drop multiple values] _____________________________________________________________ drop = {'chars': 'b', 'int': [2, 0]}, pattern = 'Trying to drop multiple values', as_array = False @pytest.mark.parametrize( "drop, pattern", [ [dict({"chars": "b"}), "`drop` should have as many columns"], [ dict({"chars": "b", "int": [2, 0]}), "Trying to drop multiple values", ], [ dict({"chars": "b", "int": 3}), "Some categories [0-9a-zA-Z, ]* were not found", ], [ DataFrame({"chars": "b", "int": 3}), "Wrong input for parameter `drop`.", ], ], ) @pytest.mark.parametrize("as_array", [True, False], ids=["cupy", "cudf"]) def test_onehot_drop_exceptions(drop, pattern, as_array): > X = DataFrame({"chars": ["c", "b", "d"], "int": [2, 1, 0]}) test_one_hot_encoder.py:237: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/nvtx/nvtx.py:116: in inner result = func(*args, **kwargs) /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cudf/core/dataframe.py:848: in __init__ self._init_from_dict_like( /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/nvtx/nvtx.py:116: in inner result = func(*args, **kwargs) /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cudf/core/dataframe.py:1066: in _init_from_dict_like keys, values, lengths = zip( /nvme/1/thead/miniconda/envs/cuml-dev-24.06/lib/python3.11/site-packages/cudf/core/dataframe.py:1072: inFor the rest of the tests the results look like this:
test_compose.py::test_column_transformer_index
test_doctest.py::test_docstring
test_holtwinters.py::test_singlets_holtwinters
test_input_utils.py::test_convert_matrix_order_cuml_array
https://github.com/rapidsai/cuml/pull/5882test_input_utils.py::test_convert_input_dtype
https://github.com/rapidsai/cuml/pull/5885test_label_encoder.py::test_inverse_transform
test_label_encoder.py::test_empty_input
test_label_encoder.py::test_inverse_transform_cupy_numpy
test_metrics.py::test_sklearn_search
test_module_config.py::test_default_global_output_type
test_ordinal_encoder.py::test_ordinal_encoder_df
test_ordinal_encoder.py::test_ordinal_encoder_array
test_ordinal_encoder.py::test_output_type
test_target_encoder.py::test_targetencoder_random
test_target_encoder.py::test_targetencoder_median
test_tsne.py::test_tsne_knn_graph_used
test_tsne.py::test_tsne
explainer/test_sampling.py::test_kmeans_input