rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.31k stars 887 forks source link

[FEA] Consider implementing standard operators on pylibcudf Columns, Scalars, and possibly Tables #15183

Open vyasr opened 7 months ago

vyasr commented 7 months ago

Is your feature request related to a problem? Please describe. pylibcudf objects support many of the standard operations that Python objects expect to work via dunder methods that map to language operator -- such as len(x) == x.__len__() -- via methods instead. The reason for preferring methods is that methods are typed and as such are preferable in pure Cython contexts because they can immediately produce typed outputs and because they can avoid Python function call overhead when invoked on Cython-typed inputs. The downside of this approach is that when pylibcudf is used as a Python library it is more verbose and less idiomatic to have to use col.size() than len(col).

Describe the solution you'd like I'm not entirely convinced that implementing all operators is worthwhile given the potential (albeit minor) typing issues/performance footguns it introduces in Cython code, but I think we should at least consider it and can use this issue to document our conclusions one way or another. If we do choose to move forward, I think it makes sense to implement things like binary operators on Columns and Scalars. Tables are a harder sell; most likely we will only want to implement simple operators like len and require users to manually handle binary operations on a per-column basis since the isomorphism between a Table and a 2D array is weak at best and binary operations likely have far too many edge cases to be worth pursuing.

wence- commented 6 months ago

wrt binops on columns, adding some sugar in the python wrappers is probably not a big deal. However, usage would likely be quite noisy, since I would not want to add dtype casting to the magic, so users would probably have to do a = plc.unary.cast(a) + b anyway

vyasr commented 6 months ago

Agreed, pylibcudf should have no hidden performance traps.