rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.42k stars 901 forks source link

[BUG][FEA] Convert Dask Array to Dask cuDF DataFrame causes ArrowInvalid Error #9398

Open esnvidia opened 3 years ago

esnvidia commented 3 years ago

While creating a synthetic dataset with cuML Dask make_classification to create into a cuGraph object I get

ArrowInvalid: Could not convert 250000 with type cupy._core.core.ndarray: did not recognize Python value type when inferring an Arrow data type

when converting a Dask Array (output from cuML dask make_classification ) -> Dask DataFrame -> Dask cuDF DataFrame to instantiate a cuGraph DiGraph object. I think 250000 refers to the second index division (0 being the first).

See attached sample code to reproduce, which also has other things I tried as well.

Feature: For a possible future feature to create dask_cudf DataFrames from Dask Arrays directly? Something like dask_cudf.DataFrame.from_dask_array(dask_array, columns=[] ...) or similar would be nice.

Expected behavior Create a dask cuDF DataFrame and run louvain.

Environment details docker container withRapids 21.06 on a DGX

Here's sample code snippet.txt

beckernick commented 3 years ago

Hi @esnvidia , thanks for filing an issue. Is it possible to reduce this to a minimal example? Or does it only reproduce in the context of cuML's distributed NearestNeighbors, concatenating, and other operations

esnvidia commented 3 years ago

Hi @beckernick it's possible to make it work w/o Nearest Neighbors, I just needed a way to make source-destination pairs, that I'm using for an example. It's pretty minimal. The issue is going from a dask array -> dask_cudf DataFrame. I generate the dask array with cuML because it's how I'm actually doing it and it should ensure that it's cupy aware too.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.