pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.39k stars 1.97k forks source link

Implement dunder methods present in `Series` but not `DataFrame` #10608

Open Wainberg opened 1 year ago

Wainberg commented 1 year ago

Problem description

Series has 19 dunder methods that DataFrame doesn't (not counting __column_consortium_standard__):

__abs__
__and__
__array_ufunc__
__invert__
__matmul__
__neg__
__or__
__pos__
__pow__
__rand__
__rfloordiv__
__rmatmul__
__rmod__
__ror__
__rpow__
__rsub__
__rtruediv__
__rxor__
__xor__

Some of these implement basic behavior that new users coming from pandas/NumPy would expect to be supported:

>>> df ** 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ** or pow(): 'DataFrame' and 'int'
>>> -df
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for unary -: 'DataFrame'
>>> 1 - df
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'int' and 'DataFrame'
ritchie46 commented 1 year ago

A DataFrame is not a tensor. I am strongly opposed by many of these and I think we should even deprecate the old behaviors. DataFrames have many mixed types and assuming you can multiply all of them is very tricky.

We now have a selector API and we should always use this IMO:

from polars import selectors as cs

df = pl.DataFrame({
    "a": [1 , 2, 3]
})

df.with_columns(cs.numeric() * 2)
Wainberg commented 1 year ago

I get the desire to not be pandas and have 5 different ways of doing everything, but DataFrames already support +, -, *, /, //, %, ==, !=, <, >, <=, >=, so I figure it's better to either deprecate all of those or support the remaining ones. If someone comes from pandas and tries 1 + df and it works and 1 - df and it's not implemented, they may naively say polars isn't ready for prime time yet and jump ship. The consistency is more important than anything.

ritchie46 commented 1 year ago

Yes, go all or go home. But I think we should remove the others as well.

We now have an API to support it more granulary and just more safe.

stinodego commented 9 months ago

There is some DataFrame arithmetic implemented in Rust: https://github.com/pola-rs/polars/blob/main/crates/polars-core/src/frame/arithmetic.rs

If we can have a proper architecture for how DataFrame arithmetic works (e.g. how to deal with non-numeric columns, different column names, different shapes), and this is implemented in Rust, I would be in favor of extending the existing arithmetic operations.