rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.37k stars 894 forks source link

[FEA] Create `pylibcudf.Table` from a `table_view`and an arbitrary owning object #17040

Open madsbk opened 1 week ago

madsbk commented 1 week ago

In https://github.com/rapidsai/cudf/pull/17012, we have a table_view but no owning Table thus it would be useful to be able to create a pylibcudf.Table from a table_view and an arbitrary owning object (in this case a PackedColumnsinstance).

Similarly, it should be possible to create a pylibcudf.Column from a column_view and an arbitrary owning object.

cc. @wence-

vyasr commented 1 week ago

The main challenge here would be with children. If I call from_column_view(cv, arbitrary) and cv has children, what should their owners be? The current logic is quite strict in assuming that the owning column owns the corresponding buffer for that child's data. How should this work in general? If it shouldn't, should it be forbidden? Should we assume that if we are ingesting views containing nested types that we must be coming from a Column?

wence- commented 1 week ago

In the cudf-classic model, if we have an arbitrary object that we can't decompose, that becomes the owner for every child.

In the cudf::unpack case I think we just have to follow that model if we want to avoid a copy. Every column (and null mask) that comes out of cudf::unpack is backed by the same single allocation, so the owner of every child is the same object.

vyasr commented 1 week ago

I agree that's probably the best that we can do. Perhaps a fused type of Column | object such that we take the smarter child path for Column and the more naive "all children are owned by owner" path for everything else.