pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.11k stars 1.83k forks source link

Add arithmetic for `arr` namespaces #14541

Open mcrumiller opened 6 months ago

mcrumiller commented 6 months ago

Description

Arrays are equal length series and we should be able to add, subtract, multiply, and divide them if they have the same dtype:

import polars as pl

df = pl.DataFrame({
    "a": pl.Series([[1, 2], [3, 4]], dtype=pl.Array(pl.UInt8, 2)),
    "b": pl.Series([[1, 2], [3, 4]], dtype=pl.Array(pl.UInt8, 2)),
})

# any of these fail, seems reasonable they should be implemented
df["a"] + df["b"]
df.select(pl.col("a") + pl.col("b"))
df.select(pl.sum_horizontal("a", "b"))
pyo3_runtime.PanicException: `add` operation not supported for dtype `array[u8, 2]`
ritchie46 commented 3 months ago

Yes, we should support this.

itamarst commented 1 month ago

These all work now (https://github.com/pola-rs/polars/pull/16791). However, division is broken because the Python code for division tries to cast to Float64, when it should really be casting in this case to pl.Array(pl.Float64, 2).

Building on the example above:

>>> df["a"] / df["b"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/itamarst/devel/polars/py-polars/polars/series/series.py", line 1082, in __truediv__
    return self.cast(Float64) / other
  File "/home/itamarst/devel/polars/py-polars/polars/series/series.py", line 3992, in cast
    return self._from_pyseries(self._s.cast(dtype, strict, wrap_numerical))
polars.exceptions.InvalidOperationError: cannot cast Array type (inner: 'UInt8', to: 'Float64')
itamarst commented 1 month ago

I will probably fix the division issue as part of #9188, where I have the same problem with List.