rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.41k stars 899 forks source link

[FEA] Don't copy data in to/from_dlpack when unnecessary #10874

Open shwina opened 2 years ago

shwina commented 2 years ago

To .to_dlpack() and .from_dlpack() methods in cuDF currently always perform a copy to/from the DLTensor. This is reasonable for DataFrames, as the columns of a dataframe in cuDF are not contiguous in memory, nor are they always of the same data type.

For a Series however, I believe we should be able to zero-copy to and from DLPack. That is not the case today:

>>> import cudf
>>> import cupy as cp
>>> s = cudf.Series([1, 2, 3])
>>> arr = cp.from_dlpack(s.to_dlpack())
>>> s._column.data.ptr
139742968545280
>>> arr.data.ptr
139742968545792
jrhemstad commented 2 years ago

This wouldn't be possible with the existing from_dlpack function in libcudf as it always returns a unique_ptr<column> that expects to own the data.

That said, we could add a view_dlpack function that returns a column_view/table_view.

I think a non-copying to_dlpack could also be possible by adding an overload of to_dlpack that takes a column&& or table&& and therefore takes ownership of those objects and gives it to the returned DLManagedTensor object.

shwina commented 2 years ago

I think a non-copying to_dlpack could also be possible by adding an overload of to_dlpack that takes a column&& or table&& and therefore takes ownership of those objects and gives it to the returned DLManagedTensor object.

I don't think this would work for Python since we never own any column objects ourselves.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

vyasr commented 5 months ago

This work is adjacent to #10849