pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

`Variable` may contain numpy scalars with `numpy>=2.1` #9399

Open keewis opened 3 weeks ago

keewis commented 3 weeks ago

What is your issue?

I'm not sure if this is a bad thing or not, but while writing tests @TomNicholas and I noticed that starting with numpy=2.1, Variable objects may contain numpy scalars, especially as the result of an aggregation like mean.

This is caused by numpy scalars gaining a __array_namespace__ method, which is then interpreted by as_compatible_data as an actual array.

To fix this, we could change https://github.com/pydata/xarray/blob/a04d857a03d1fb04317d636a7f23239cb9034491/xarray/core/variable.py#L313-L315 to

if not isinstance(data, (np.numeric, np.ndarray)) and ( 
    hasattr(data, "__array_function__") or hasattr(data, "__array_namespace__") 
):

but not sure if that's worth it.

To reproduce, try this in an environment with numpy>=2.1:

import numpy as np
import xarray as xr

v = xr.Variable((), np.float64(4.1))
repr(v)
# <xarray.Variable ()> Size: 8B
# np.float64(4.1)
dcherian commented 3 weeks ago

Aren't there non-numeric scalars? Perhaps .ndim == 0 is a better check.

shoyer commented 3 weeks ago

I think we should probably add a special case for NumPy scalars, to cast them to arrays. It's simpler for Xarray users to always have NumPy arrays.

keewis commented 3 weeks ago

Wouldn't

not isinstance(data, (np.generic, np.ndarray))

work then? With that scalars would be passed through to the np.asarray call, just like np.ndarray.

shoyer commented 3 weeks ago

Wouldn't

not isinstance(data, (np.generic, np.ndarray))

work then? With that scalars would be passed through to the np.asarray call, just like np.ndarray.

Yes, that would do the trick.