Open Wainberg opened 9 months ago
This appears to be a feature request for additional numpy
interop rather than a bug ;)
Touché ;)
But honestly, I'd still classify it as a bug because this works:
>>> pl.Series([1, 2, 3]) + np.array([1, 2, 3])
shape: (3,)
Series: '' [i64]
[
2
4
6
]
and all operations with NumPy array + Series/DataFrame work (by auto-converting the RHS to a NumPy array). So it's really just these 5 cases that give an error when they shouldn't.
The way I implemented this in https://github.com/pola-rs/polars/pull/12426 is by editing _prepare_other_arg
, as discussed here: https://github.com/pola-rs/polars/issues/14080.
Note that operations involving NumPy arrays could be made commutative (so that np.array + pl.DataFrame gives a pl.DataFrame rather than an np.array, to match pl.DataFrame + np.array) by overriding __array_ufunc__
, as implemented in https://github.com/pola-rs/polars/pull/12426:
For Series:
_operator_ufuncs: ClassVar[dict[np.ufunc, tuple[str, str]]] = {
np.equal: ("__eq__", "__eq__"),
np.not_equal: ("__ne__", "__ne__"),
np.greater: ("__gt__", "__lt__"),
np.greater_equal: ("__ge__", "__le__"),
np.less: ("__lt__", "__gt__"),
np.less_equal: ("__le__", "__ge__"),
np.add: ("__add__", "__radd__"),
np.subtract: ("__sub__", "__rsub__"),
np.multiply: ("__mul__", "__rmul__"),
np.divide: ("__truediv__", "__rtruediv__"),
np.true_divide: ("__truediv__", "__rtruediv__"),
np.floor_divide: ("__floordiv__", "__rfloordiv__"),
np.power: ("__pow__", "__rpow__"),
np.remainder: ("__mod__", "__rmod__"),
np.mod: ("__mod__", "__rmod__"),
np.bitwise_and: ("__and__", "__rand__"),
np.bitwise_or: ("__or__", "__ror__"),
np.bitwise_xor: ("__xor__", "__rxor__"),
np.matmul: ("__matmul__", "__rmatmul__"),
}
def __array_ufunc__(
self,
ufunc: np.ufunc,
method: Literal[
"__call__", "reduce", "reduceat", "accumulate", "outer", "inner"
],
*inputs: Any,
**kwargs: Any,
) -> Series:
"""Numpy universal functions."""
if self._s.n_chunks() > 1:
self._s.rechunk(in_place=True)
s = self._s
if method == "__call__":
# For ufuncs that correspond to operators, delegate to the polars
# implementation of those operators. This ensures operators are
# commutative, i.e. that they have the same behavior regardless of
# whether the NumPy array is the left-hand or the right-hand
# operand. It also ensures correct broadcasting of 2D NumPy arrays
# with polars Series.
if ufunc in self._operator_ufuncs:
if self is inputs[0]:
# self is left-hand argument
return getattr(self, self._operator_ufuncs[ufunc][0])(inputs[1])
else:
# self is right-hand argument
return getattr(self, self._operator_ufuncs[ufunc][1])(inputs[0])
...
and for DataFrame:
def __array_ufunc__(
self,
ufunc: np.ufunc,
method: Literal[
"__call__", "reduce", "reduceat", "accumulate", "outer", "inner"
],
*inputs: Any,
**kwargs: Any,
) -> Self:
"""Numpy universal functions."""
if method == "__call__" and ufunc in self._operator_ufuncs:
# For ufuncs that correspond to operators, delegate to the polars
# implementation of those operators. This ensures operators are
# commutative, i.e. that they have the same behavior regardless of
# whether the NumPy array is the left-hand or the right-hand
# operand.
if self is inputs[0]:
# self is left-hand argument
return getattr(self, self._operator_ufuncs[ufunc][0])(inputs[1])
else:
# self is right-hand argument
return getattr(self, self._operator_ufuncs[ufunc][1])(inputs[0])
else:
# Just do the default thing: call the ufunc. Call __array__() on
# each argument first to avoid infinite recursion - see
# github.com/numpy/numpy/issues/9079#issuecomment-300279535.
return getattr(ufunc, method)(
*(arr.__array__() for arr in inputs), **kwargs
)
Checks
Reproducible example
Log output
No response
Issue description
pl.DataFrame/pl.Series + np.array is broken.
Expected behavior
All of these should be allowed.
Installed versions