pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Allow wrapping astropy.units.Quantity #9705

Closed tien-vo closed 1 week ago

tien-vo commented 2 weeks ago

Ticks off first item in #9704 and also astropy #14454 .

The conditions added catch astropy.units.Quantity type while banning numpy.matrix and numpy.ma.masked_array, as suggested in #525 , i.e.,

>>> _is_array_like = lambda data: isinstance(data, np.ndarray | np.generic)
>>> _is_nep18 = lambda data: hasattr(data, "__array_function__")
>>> _has_array_api = lambda data: hasattr(data, "__array_namespace__")
>>> _has_unit = lambda data: hasattr(data, "_unit")
>>> catch_astropy = (
...    lambda data: _is_array_like(data)
...    and (_is_nep18(data) or _has_array_api(data))
...    and _has_unit(data)
... )

>>> catch_non_numpy_array = lambda data: not _is_array_like(data) and (
...    _is_nep18(data) or _has_array_api(data)
... )

>>> narray = np.arange(100)
>>> matrix = np.matrix(np.identity(3))
>>> marray = np.ma.masked_array(narray)
>>> uarray = u.Quantity(narray)
>>> parray = pint.Quantity(narray)
>>> darray = da.array(narray)

>>> catch_astropy(narray)  # False
>>> catch_astropy(matrix)  # False
>>> catch_astropy(marray)  # False
>>> catch_astropy(uarray)  # True
>>> catch_astropy(parray)  # False
>>> catch_astropy(darray)  # False

>>> catch_non_numpy_array(narray)  # False
>>> catch_non_numpy_array(matrix)  # False
>>> catch_non_numpy_array(marray)  # False
>>> catch_non_numpy_array(uarray)  # False
>>> catch_non_numpy_array(parray)  # True
>>> catch_non_numpy_array(darray)  # True

@keewis The above can be put into tests. I'm just not sure where. xarray/tests/test_units.py seems busy with pint-related stuff.

welcome[bot] commented 2 weeks ago

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. If you have questions, some answers may be found in our contributing guidelines.

keewis commented 2 weeks ago

Thanks for the PR, @tien-vo.

This is good a start, but I believe we should try to avoid special-casing astropy.units.Quantity as much as possible. Instead, what I had envisioned is passing through numpy.ndarray subclasses (except np.matrix and np.ma.MaskedArray), but not numpy.ndarray itself. The condition would then be something like

if isinstance(data, np.matrix):
    data = np.asarray(data)

# immediately return array-like types except `numpy.ndarray` and `numpy` scalars
# compare types with `is` instead of `isinstance` to allow `numpy.ndarray` subclasses
is_numpy = type(data) is np.ndarray or isinstance(data, np.generic)
if not is_numpy and (hasattr(data, "__array_function__") or hasattr(data, "__array_namespace__")):
    return cast("T_DuckArray", data)

and any tests can be added to xarray/tests/test_variable.py::TestAsCompatibleData. You could call it test_numpy_subclass, and similarly to test_unsupported_type it may be enough to create a very simple numpy.ndarray subclass and pass that to the constructor of xr.Variable.

@shoyer, do you have any opinions on allowing numpy.ndarray subclasses? This is very similar to what I tried to do back in #2956.

shoyer commented 2 weeks ago

Allowing ndarray subclasses that are not matrix or MaskedArray sounds good to me!

On Sun, Nov 3, 2024 at 1:38 PM Justus Magin @.***> wrote:

Thanks for the PR, @tien-vo https://github.com/tien-vo.

This is good a start, but I believe we should try to avoid special-casing astropy.units.Quantity as much as possible. Instead, what I had envisioned is passing through numpy.ndarray subclasses (except np.matrix and np.ma.MaskedArray), but not numpy.ndarray itself. The condition would then be something like

if isinstance(data, np.matrix): data = np.asarray(data)

immediately return array-like types except numpy.ndarray and numpy scalars# compare types with is instead of isinstance to allow numpy.ndarray subclassesis_numpy = type(data) is np.ndarray or isinstance(data, np.generic)if not is_numpy and (hasattr(data, "array_function") or hasattr(data, "array_namespace")):

return cast("T_DuckArray", data)

and any tests can be added to xarray/tests/test_variable.py::TestAsCompatibleData. You could call it test_numpy_subclass, and similarly to test_unsupported_type it may be enough to create a very simple numpy.ndarray subclass and pass that to the constructor of xr.Variable.

@shoyer https://github.com/shoyer, do you have any opinions on allowing numpy.ndarray subclasses? This is very similar to what I tried to do back in #2956 https://github.com/pydata/xarray/pull/2956.

— Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/9705#issuecomment-2453585514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVW5N7NMEKE6NQUBMEDZ62JUHAVCNFSM6AAAAABRCI6OOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTGU4DKNJRGQ . You are receiving this because you were mentioned.Message ID: @.***>

slevang commented 2 weeks ago

Instead, what I had envisioned is passing through numpy.ndarray subclasses

+1 to this! Was just playing around with a very simple numpy subclass and was surprised to see that this is not currently supported.

slevang commented 1 week ago

I started #9760 just because I would love to get this feature added soon.