Open vyasr opened 9 months ago
Some relevant functionality was added in https://github.com/rapidsai/cudf/pull/15615. That PR handled device arrays. Next we need to add host arrays (the __array_interface__
protocol) and then see if there are other types that make sense. We probably should support the arrow capsule interface directly as well, which should be straightforward via calls to pylibcudf.interop.
Is your feature request related to a problem? Please describe. Currently core pylibcudf owning objects are difficult to construct. Columns are constructed using a complete signature involving nearly raw pointers, while Scalars are really only constructible from pyarrow scalars. While this approach is sufficient for now due to the interop layers baked into cuDF's Cython, in the long run this will not be a reasonable API for users. We need to define simpler ways for users to create these objects, ideally without sacrificing the performance of the lowest level APIs in cases where power users want to access them.
Describe the solution you'd like We should define a
singledispatch
classmethod
factoryfrom_any
that accepts an arbitrary input and attempts to construct from it. Each specialization should be a trivial one-line passthrough to a separateclassmethod
factory of the appropriate type, such that users who know what input they have could always call the appropriate factory themselves. For instance:Describe alternatives you've considered We could wrap the
singledispatch
function in a higher-level fused-type Cython function that could do compile-time dispatch on known C types. That would offer a minor performance benefit in a limited number of cases, but at the cost of unnecessary complexity IMO, especially since the vast majority of user inputs are not going to be Cython-typed objects but will instead come in as PyObjects that need to be introspected at runtime viaisinstance
anyway (which is whatsingledispatch
does).Additional context The current Column constructor is not the most usable and we may eventually want to make the current constructor into a factor and instead have the constructor be something more user-friendly.