Open Wainberg opened 1 year ago
A DataFrame is not a tensor. I am strongly opposed by many of these and I think we should even deprecate the old behaviors. DataFrames have many mixed types and assuming you can multiply all of them is very tricky.
We now have a selector API and we should always use this IMO:
from polars import selectors as cs
df = pl.DataFrame({
"a": [1 , 2, 3]
})
df.with_columns(cs.numeric() * 2)
I get the desire to not be pandas and have 5 different ways of doing everything, but DataFrames already support +, -, *, /, //, %, ==, !=, <, >, <=, >=, so I figure it's better to either deprecate all of those or support the remaining ones. If someone comes from pandas and tries 1 + df
and it works and 1 - df
and it's not implemented, they may naively say polars isn't ready for prime time yet and jump ship. The consistency is more important than anything.
Yes, go all or go home. But I think we should remove the others as well.
We now have an API to support it more granulary and just more safe.
There is some DataFrame arithmetic implemented in Rust: https://github.com/pola-rs/polars/blob/main/crates/polars-core/src/frame/arithmetic.rs
If we can have a proper architecture for how DataFrame arithmetic works (e.g. how to deal with non-numeric columns, different column names, different shapes), and this is implemented in Rust, I would be in favor of extending the existing arithmetic operations.
Problem description
Series
has 19 dunder methods thatDataFrame
doesn't (not counting__column_consortium_standard__
):Some of these implement basic behavior that new users coming from pandas/NumPy would expect to be supported: