Open vyasr opened 7 months ago
wrt binops on columns, adding some sugar in the python wrappers is probably not a big deal. However, usage would likely be quite noisy, since I would not want to add dtype casting to the magic, so users would probably have to do a = plc.unary.cast(a) + b
anyway
Agreed, pylibcudf should have no hidden performance traps.
Is your feature request related to a problem? Please describe. pylibcudf objects support many of the standard operations that Python objects expect to work via dunder methods that map to language operator -- such as
len(x) == x.__len__()
-- via methods instead. The reason for preferring methods is that methods are typed and as such are preferable in pure Cython contexts because they can immediately produce typed outputs and because they can avoid Python function call overhead when invoked on Cython-typed inputs. The downside of this approach is that when pylibcudf is used as a Python library it is more verbose and less idiomatic to have to usecol.size()
thanlen(col)
.Describe the solution you'd like I'm not entirely convinced that implementing all operators is worthwhile given the potential (albeit minor) typing issues/performance footguns it introduces in Cython code, but I think we should at least consider it and can use this issue to document our conclusions one way or another. If we do choose to move forward, I think it makes sense to implement things like binary operators on Columns and Scalars. Tables are a harder sell; most likely we will only want to implement simple operators like
len
and require users to manually handle binary operations on a per-column basis since the isomorphism between a Table and a 2D array is weak at best and binary operations likely have far too many edge cases to be worth pursuing.