[QST] Difficulty joining the output of cuml.dask.preprocessing.OneHotEncoder with the source dask_cudf

I am having trouble joining the output of cuml.dask.preprocessing.OneHotEncoder with the source dask_cudf. What is a correct way to do that?

Create dask_cudf

orig = pd.DataFrame({"one": np.array(["a", "b", "c", "c", "z", "z", "b", "b", "b", "c", "a", "c"],),
                 "two": np.array(["b", "b", "c", "c", "z", "y", "b", "y", "b", "c", "b", "c"])})
df = dd.from_pandas(orig, npartitions=3)
ddf = dask_cudf.from_dask_dataframe(df)

OHE

from cuml.dask.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse = False)
out = ohe.fit_transform(ddf)

Even though I pass in a dask_cudf, I receive a dask array of cupy.ndarrays (which is slightly unexpected). However, when I inspect the column values it all looks correct.

>>> type(out)
dask.array.core.Array

Merge to original dask_cudf

out.compute_chunk_sizes() # Works
out.to_dask_dataframe() # Works
ddf['temp'] = out[0] # ValueError: Number of partitions do not match


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-fd7e8200678c> in <module>
  1 out.compute_chunk_sizes() # Works
  2 out.to_dask_dataframe() # Works
----> 3 ddf['temp'] = out[0] # ValueError: Number of partitions do not matc

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/core.py in setitem(self, key, value) 3488 df = self.assign({k: value for k in key}) 3489 else: -> 3490 df = self.assign({key: value}) 3491 3492 self.dask = df.dask

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs) 3747 raise ValueError( 3748 "Number of partitions do not match ({0} != {1})".format( -> 3749 v.npartitions, self.npartitions 3750 ) 3751 )

ValueError: Number of partitions do not match (1 != 3)



Thank you,

Tagging @Garfounkel as the output should be mirrored to the input. Also, we should be returning the same number of partitions, right?

@hassanshamji while we look into the input/output mismatch, we could try making the current code work. Before you do out.compute_chunk_sizes(), could you try out.rechunk((int(out.shape[0] / 3) + 1, -1)?

The number of partitions is correct, but in order to have the same behavior as scikit-learn we transpose the resulting one hot encoded matrix so the final shape is (number_of_samples, categories), or in your case (12, 8). Therefore the indexing you want to use to retrieve a columns is out[:, col_number] instead of out[col_number].

out[:, 0].npartitions  # 3
ddf['temp'] = out[:, 0]  # no error

As for the output type mirroring the input. Again, we chose to mimic sklearn's behavior which is to return an array whatever the input's type. We could change this if needed, but then we'd diverge from sklearn. As you pointed out, if you need the output as a dataframe:

out.compute_chunk_sizes()
out_ddf = out.to_dask_dataframe(columns=cudf.concat(ohe.categories_).tolist())

@Garfounkel thanks for looking into this! It doesn't seem like a bug anymore then. @hassanshamji if you confirm the above works for you, I will go ahead and close this issue

Thanks for the quick follow-up, @divyegala & @Garfounkel.

Regarding the input/output type, returning an array seems good. I was trying to highlight that it was an array of cupy.ndarray, rather than what I would have expected, an array of cudfs. I could be expecting the wrong thing, but just wanted to clarify that point.
Thanks for the response, @Garfounkel. When I ran ddf['temp'] = out[:, 0] I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-c2aec3acf4dc> in <module>
      1 out[:, 0].npartitions  # 3
----> 2 ddf['temp'] = out[:, 0]  # no error

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/core.py in __setitem__(self, key, value)
   3488             df = self.assign(**{k: value for k in key})
   3489         else:
-> 3490             df = self.assign(**{key: value})
   3491 
   3492         self.dask = df.dask

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs)
   3750                         )
   3751                     )
-> 3752                 kwargs[k] = from_dask_array(v, index=self.index)
   3753 
   3754         pairs = list(sum(kwargs.items(), ()))

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/io/io.py in from_dask_array(x, columns, index)
    414     dask.dataframe._Frame.to_records: Reverse conversion
    415     """
--> 416     meta = _meta_from_array(x, columns, index)
    417 
    418     if x.ndim == 2 and len(x.chunks[1]) > 1:

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/dataframe/io/io.py in _meta_from_array(x, columns, index)
     48     elif x.ndim == 1:
     49         if np.isscalar(columns) or columns is None:
---> 50             return pd.Series([], name=columns, dtype=x.dtype, index=index)
     51         elif len(columns) == 1:
     52             return pd.DataFrame(

~/.conda/envs/gpu_env/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    213 
    214             if index is not None:
--> 215                 index = ensure_index(index)
    216 
    217             if data is None:

~/.conda/envs/gpu_env/lib/python3.6/site-packages/pandas/core/indexes/base.py in ensure_index(index_like, copy)
   5742         return index_like
   5743     if hasattr(index_like, "name"):
-> 5744         return Index(index_like, name=index_like.name, copy=copy)
   5745 
   5746     if is_iterator(index_like):

~/.conda/envs/gpu_env/lib/python3.6/site-packages/pandas/core/indexes/base.py in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
    515 
    516         elif hasattr(data, "__array__"):
--> 517             return Index(np.asarray(data), dtype=dtype, copy=copy, name=name, **kwargs)
    518         elif data is None or is_scalar(data):
    519             cls._scalar_data_error(data)

~/.conda/envs/gpu_env/lib/python3.6/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

~/.conda/envs/gpu_env/lib/python3.6/site-packages/cudf/core/frame.py in __array__(self, dtype)
   1054             To explicitly construct a GPU array, consider using \
   1055             cupy.asarray(...)\nTo explicitly construct a \
-> 1056             host array, consider using .to_array()"
   1057         )
   1058 

TypeError: Implicit conversion to a host NumPy array via __array__ is not allowed,             To explicitly construct a GPU array, consider using             cupy.asarray(...)
To explicitly construct a             host array, consider using .to_array()

And trying the cupy.asarray(...) returned:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-9a8f72758b42> in <module>
      3 
      4 import cupy
----> 5 ddf['temp'] = cupy.asarray(out[:, 0])

~/.conda/envs/gpu_env/lib/python3.6/site-packages/cupy/creation/from_data.py in asarray(a, dtype, order)
     66 
     67     """
---> 68     return core.array(a, dtype, False, order)
     69 
     70 

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core._send_object_to_gpu()

~/.conda/envs/gpu_env/lib/python3.6/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
   1338             x = x.astype(dtype)
   1339         if not isinstance(x, np.ndarray):
-> 1340             x = np.array(x)
   1341         return x
   1342 

ValueError: object __array__ method not producing an array

@hassanshamji I am not able to reproduce your issue. When I run the following I don't get any errors:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import pandas as pd
import dask.dataframe as dd
import dask_cudf
import numpy as np
import dask.array as da
import cupy as cp
import cudf
from cuml.dask.preprocessing import OneHotEncoder

orig = pd.DataFrame({"one": np.array(["a", "b", "c", "c", "z", "z", "b", "b", "b", "c", "a", "c"],),
                     "two": np.array(["b", "b", "c", "c", "z", "y", "b", "y", "b", "c", "b", "c"])})
df = dd.from_pandas(orig, npartitions=3)
ddf = dask_cudf.from_dask_dataframe(df)

cluster = LocalCUDACluster()
client = Client(cluster)

ohe = OneHotEncoder(sparse = False)
out = ohe.fit_transform(ddf)

out.compute_chunk_sizes()

out[:, 0].npartitions  # 3
ddf['temp'] = out[:, 0]  # no error

cluster.close()
client.close()

Could you try this and tell us if this works for you?

@Garfounkel,

That is strange. I ran those steps exactly and am getting the same error that I reproduced above, (TypeError: Implicit conversion to...)

I'm using a p3.8xlarge | 32 Cores | 244 GB Memory | 4 - NVIDIA V100 GPU

What information can I provide to help diagnose this?

@hassanshamji Could you provide us your installed packages information? (The output of conda list)

I don't have access to a GPU at the moment so it's a bit difficult to help with this, @divyegala or @dantegd could you follow up on this when you get a moment?

TY @Garfounkel, I'm attaching the conda requirements file I've been using. Please let me know if you need more information.

# Usage: conda env create -f ~/environment/gpu_env.yml
name: gpu_env
--
  | channels:
  | - https://conda.anaconda.org/rapidsai
  | - https://conda.anaconda.org/nvidia
  | - conda-main
  | - conda-forge
  | - conda-r
  | - conda-nvidia
  | - conda-numba
  | - conda-rapidsai
  | dependencies:
  | - _libgcc_mutex=0.1=conda_forge
  | - _openmp_mutex=4.5=0_gnu
  | - aiohttp=3.6.2=py36h7b6447c_0
  | - appdirs=1.4.3=py36h28b3542_0
  | - arrow-cpp=0.15.0=py36h090bef1_2
  | - async-timeout=3.0.1=py36_0
  | - attrs=19.3.0=py_0
  | - backcall=0.1.0=py36_0
  | - blas=1.0=mkl
  | - bleach=3.1.4=py_0
  | - bokeh=1.4.0=py36_0
  | - boost=1.70.0=py36h9de70de_1
  | - boost-cpp=1.70.0=h8e57a91_2
  | - brotli=1.0.7=he6710b0_0
  | - bzip2=1.0.8=h7b6447c_0
  | - c-ares=1.15.0=h7b6447c_1001
  | - ca-certificates=2020.1.1=0
  | - cairo=1.16.0=hcf35c78_1003
  | - certifi=2020.4.5.1=py36_0
  | - cffi=1.14.0=py36h2e261b9_0
  | - cfitsio=3.470=hb7c8383_2
  | - chardet=3.0.4=py36_1003
  | - click=7.1.2=py_0
  | - click-plugins=1.1.1=py_0
  | - cligj=0.5.0=py36_0
  | - cloudpickle=1.4.1=py_0
  | - cmake=3.14.0=h52cb24c_0
  | - colorcet=2.0.2=py_0
  | - contextvars=2.4=py_0
  | - cryptography=2.9.2=py36h1ba5d50_0
  | - cudatoolkit=10.1.243=h6bb024c_0
  | - cudf=0.14.0=py36_0
  | - cudnn=7.6.0=cuda10.1_0
  | - cugraph=0.14.0=py36_0
  | - cuml=0.14.0=cuda10.1_py36_0
  | - cupy=7.5.0=py36h5c369b2_0
  | - curl=7.67.0=hbc83047_0
  | - cusignal=0.14.0=py36_0
  | - cuspatial=0.14.0=py36_0
  | - cuxfilter=0.14.0=py36_0
  | - cycler=0.10.0=py36_0
  | - cytoolz=0.10.1=py36h7b6447c_0
  | - dask=2.17.2=py_0
  | - dask-core=2.17.2=py_0
  | - dask-cuda=0.14.0=py36_0
  | - dask-cudf=0.14.0=py36_0
  | - dask-xgboost=0.2.0.dev28=cuda10.1py36_0
  | - datashader=0.10.0=py_0
  | - datashape=0.5.4=py36_1
  | - dbus=1.13.14=hb2f20db_0
  | - decorator=4.4.2=py_0
  | - defusedxml=0.6.0=py_0
  | - distributed=2.17.0=py36_0
  | - dlpack=0.2=he1b5a44_1
  | - double-conversion=3.1.5=he6710b0_1
  | - entrypoints=0.3=py36_0
  | - expat=2.2.6=he6710b0_0
  | - fastavro=0.23.4=py36h7b6447c_0
  | - fastrlock=0.4=py36he6710b0_0
  | - fiona=1.8.11=py36h41e4f33_0
  | - fontconfig=2.13.1=h86ecdb6_1001
  | - freetype=2.9.1=h8a8886c_1
  | - freexl=1.0.5=h14c3975_0
  | - fsspec=0.7.4=py_0
  | - gdal=3.0.2=py36hbb6b9fb_2
  | - geopandas=0.6.1=py_0
  | - geos=3.7.2=he1b5a44_2
  | - geotiff=1.5.1=h21e8280_1
  | - gflags=2.2.2=he6710b0_0
  | - giflib=5.1.7=h516909a_1
  | - glib=2.63.1=h5a9c865_0
  | - glog=0.4.0=he6710b0_0
  | - gmp=6.1.2=h6c8ec71_1
  | - grpc-cpp=1.23.0=h18db393_0
  | - gst-plugins-base=1.14.5=h0935bb2_2
  | - gstreamer=1.14.5=h36ae1b5_2
  | - hdf4=4.2.13=h3ca952b_2
  | - hdf5=1.10.5=nompi_h3c11f04_1104
  | - heapdict=1.0.1=py_0
  | - icu=64.2=he1b5a44_1
  | - idna=2.9=py_1
  | - idna_ssl=1.1.0=py36_0
  | - imageio=2.8.0=py_0
  | - immutables=0.11=py36h7b6447c_0
  | - importlib-metadata=1.6.0=py36_0
  | - importlib_metadata=1.6.0=0
  | - intel-openmp=2020.1=217
  | - ipykernel=5.1.4=py36h39e3cac_0
  | - ipython=7.13.0=py36h5ca1d4c_0
  | - ipython_genutils=0.2.0=py36_0
  | - jedi=0.17.0=py36_0
  | - jinja2=2.11.2=py_0
  | - joblib=0.15.1=py_0
  | - jpeg=9d=h516909a_0
  | - json-c=0.13.1=h1bed415_0
  | - jsonschema=3.2.0=py36_0
  | - jupyter-server-proxy=1.5.0=py_0
  | - jupyter_client=6.1.3=py_0
  | - jupyter_core=4.6.3=py36_0
  | - kealib=1.4.13=hec59c27_0
  | - kiwisolver=1.2.0=py36hfd86e86_0
  | - krb5=1.16.4=h173b8e3_0
  | - ld_impl_linux-64=2.33.1=h53a641e_7
  | - libcudf=0.14.0=cuda10.1_0
  | - libcugraph=0.14.0=cuda10.1_0
  | - libcuml=0.14.0=cuda10.1_0
  | - libcumlprims=0.14.1=cuda10.1_0
  | - libcurl=7.67.0=h20c2e04_0
  | - libcuspatial=0.14.0=cuda10.1_0
  | - libdap4=3.20.4=hd3bb157_0
  | - libedit=3.1.20181209=hc058e9b_0
  | - libevent=2.1.10=h72c5cf5_0
  | - libffi=3.2.1=hd88cf55_4
  | - libgcc-ng=9.2.0=h24d8f2e_2
  | - libgdal=3.0.2=hc7cfd23_2
  | - libgfortran-ng=7.3.0=hdf63c60_0
  | - libgomp=9.2.0=h24d8f2e_2
  | - libhwloc=2.1.0=h3c4fd83_0
  | - libiconv=1.15=h63c8f33_5
  | - libkml=1.3.0=h4fcabce_1010
  | - libnetcdf=4.7.1=nompi_h94020b1_102
  | - libnvstrings=0.14.0=cuda10.1_0
  | - libpng=1.6.37=hbc83047_0
  | - libpq=11.5=hd9ab2ff_2
  | - libprotobuf=3.8.0=hd408876_0
  | - librmm=0.14.0=cuda10.1_0
  | - libsodium=1.0.16=h1bed415_0
  | - libspatialindex=1.9.3=he6710b0_0
  | - libspatialite=4.3.0a=h4f6d029_1032
  | - libssh2=1.9.0=h1ba5d50_1
  | - libstdcxx-ng=9.1.0=hdf63c60_0
  | - libtiff=4.1.0=hfc65ed5_0
  | - libuuid=2.32.1=h14c3975_1000
  | - libwebp=1.0.1=h8e7db2f_0
  | - libxcb=1.13=h1bed415_1
  | - libxgboost=1.1.0dev.rapidsai0.14=cuda10.1_0
  | - libxml2=2.9.10=hee79883_0
  | - lightgbm=2.3.0=py36he6710b0_0
  | - llvmlite=0.32.1=py36hd408876_0
  | - locket=0.2.0=py36_1
  | - lz4-c=1.8.3=he1b5a44_1001
  | - markdown=3.1.1=py36_0
  | - markupsafe=1.1.1=py36h7b6447c_0
  | - matplotlib=3.2.1=0
  | - matplotlib-base=3.2.1=py36hb8e4980_0
  | - mistune=0.8.4=py36h7b6447c_0
  | - mkl=2020.1=217
  | - mkl-service=2.3.0=py36he904b0f_0
  | - mkl_fft=1.0.15=py36ha843d7b_0
  | - mkl_random=1.1.1=py36h0573a6f_0
  | - msgpack-python=1.0.0=py36hfd86e86_1
  | - multidict=4.7.3=py36h7b6447c_0
  | - multipledispatch=0.6.0=py36_0
  | - munch=2.5.0=py_0
  | - nbconvert=5.6.1=py36_0
  | - nbformat=5.0.6=py_0
  | - nccl=2.5.7.1=h51cf6c1_0
  | - ncurses=6.2=he6710b0_1
  | - networkx=2.4=py_0
  | - nodejs=10.13.0=he6710b0_0
  | - notebook=6.0.3=py36_0
  | - numba=0.49.1=py36h0573a6f_0
  | - numpy=1.18.1=py36h4f9e942_0
  | - numpy-base=1.18.1=py36hde5b4d6_1
  | - nvstrings=0.14.0=py36_0
  | - olefile=0.46=py36_0
  | - openjpeg=2.3.1=h981e76c_3
  | - openssl=1.1.1g=h7b6447c_0
  | - packaging=20.3=py_0
  | - pandas=0.25.3=py36he6710b0_0
  | - pandoc=2.2.3.2=0
  | - pandocfilters=1.4.2=py36_1
  | - panel=0.6.4=0
  | - param=1.9.3=py_0
  | - parquet-cpp=1.5.1=2
  | - parso=0.7.0=py_0
  | - partd=1.1.0=py_0
  | - pcre=8.43=he6710b0_0
  | - pexpect=4.8.0=py36_0
  | - pickleshare=0.7.5=py36_0
  | - pillow=7.1.2=py36hb39fc2d_0
  | - pip=20.0.2=py36_3
  | - pixman=0.38.0=h7b6447c_0
  | - poppler=0.67.0=h14e79db_8
  | - poppler-data=0.4.9=0
  | - postgresql=11.5=hc63931a_2
  | - proj=6.2.1=haa6030c_0
  | - prometheus_client=0.7.1=py_0
  | - prompt-toolkit=3.0.5=py_0
  | - prompt_toolkit=3.0.5=0
  | - psutil=5.7.0=py36h7b6447c_0
  | - ptyprocess=0.6.0=py36_0
  | - py-xgboost=1.1.0dev.rapidsai0.14=cuda10.1py36_0
  | - pyarrow=0.15.0=py36h8b68381_1
  | - pycparser=2.20=py_0
  | - pyct=0.4.6=py36_0
  | - pyee=7.0.2=pyh9f0ad1d_0
  | - pygments=2.6.1=py_0
  | - pynvml=8.0.4=py_0
  | - pyopenssl=19.1.0=py36_0
  | - pyparsing=2.4.7=py_0
  | - pyppeteer=0.0.25=py_1
  | - pyproj=2.6.1.post1=py36hd003209_1
  | - pyqt=5.9.2=py36h05f1152_2
  | - pyrsistent=0.16.0=py36h7b6447c_0
  | - pysocks=1.7.1=py36_0
  | - python=3.6.10=hcf32534_1
  | - python-dateutil=2.8.1=py_0
  | - python_abi=3.6=1_cp36m
  | - pytz=2020.1=py_0
  | - pyviz_comms=0.7.4=py_0
  | - pywavelets=1.1.1=py36h7b6447c_0
  | - pyyaml=5.3.1=py36h7b6447c_0
  | - pyzmq=18.1.1=py36he6710b0_0
  | - qt=5.9.7=h0c104cb_3
  | - rapids=0.14.0=cuda10.1_py36_2
  | - rapids-xgboost=0.14.0=cuda10.1_py36_2
  | - re2=2019.08.01=he6710b0_0
  | - readline=8.0=h7b6447c_0
  | - requests=2.23.0=py36_0
  | - rhash=1.3.8=h1ba5d50_0
  | - rmm=0.14.0=py36_0
  | - rtree=0.9.4=py36_1
  | - scikit-image=0.16.2=py36h0573a6f_0
  | - scikit-learn=0.22.1=py36hd81dba3_0
  | - scipy=1.4.1=py36h0b6359f_0
  | - seaborn=0.10.1=py_0
  | - send2trash=1.5.0=py36_0
  | - setuptools=47.1.1=py36_0
  | - shapely=1.6.4=py36hec07ddf_1006
  | - simpervisor=0.3=py_1
  | - sip=4.19.8=py36hf484d3e_0
  | - six=1.15.0=py_0
  | - snappy=1.1.7=hbae5bb6_3
  | - sortedcontainers=2.1.0=py36_0
  | - spdlog=1.6.1=hc9558a2_0
  | - sqlite=3.31.1=h62c20be_1
  | - tbb=2018.0.5=h6bb024c_0
  | - tblib=1.6.0=py_0
  | - terminado=0.8.3=py36_0
  | - testpath=0.4.4=py_0
  | - thrift-cpp=0.12.0=hf3afdfd_1004
  | - tiledb=1.6.2=h7d710e0_2
  | - tk=8.6.10=hed695b0_0
  | - toolz=0.10.0=py_0
  | - tornado=6.0.4=py36h7b6447c_1
  | - tqdm=4.46.0=py_0
  | - traitlets=4.3.3=py36_0
  | - typing_extensions=3.7.4.1=py36_0
  | - tzcode=2020a=h516909a_0
  | - ucx=1.8.0+gf6ec8d4=cuda10.1_20
  | - ucx-py=0.14.0+gf6ec8d4=py36_0
  | - uriparser=0.9.3=he6710b0_1
  | - urllib3=1.25.8=py36_0
  | - wcwidth=0.1.9=py_0
  | - webencodings=0.5.1=py36_1
  | - websockets=8.1=py36h8c4c3a4_1
  | - wheel=0.34.2=py36_0
  | - xarray=0.15.1=py_0
  | - xerces-c=3.2.2=h8412b87_1004
  | - xgboost=1.1.0dev.rapidsai0.14=cuda10.1py36_0
  | - xorg-kbproto=1.0.7=h14c3975_1002
  | - xorg-libice=1.0.10=h516909a_0
  | - xorg-libsm=1.2.3=h84519dc_1000
  | - xorg-libx11=1.6.9=h516909a_0
  | - xorg-libxext=1.3.4=h516909a_0
  | - xorg-libxrender=0.9.10=h516909a_1002
  | - xorg-renderproto=0.11.1=h14c3975_1002
  | - xorg-xextproto=7.3.0=h14c3975_1002
  | - xorg-xproto=7.0.31=h14c3975_1007
  | - xz=5.2.5=h7b6447c_0
  | - yaml=0.1.7=had09818_2
  | - yarl=1.4.2=py36h7b6447c_0
  | - zeromq=4.3.1=he6710b0_3
  | - zict=2.0.0=py_0
  | - zipp=3.1.0=py_0
  | - zlib=1.2.11=h7b6447c_3
  | - zstd=1.4.3=h3b9ef0a_0
  | - pip:
  | - treelite==0.91

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

rapidsai / cuml

[QST] Difficulty joining the output of cuml.dask.preprocessing.OneHotEncoder with the source dask_cudf #2503