pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.09k forks source link

Consistent Handling of Type Casting Hierarchy #3950

Open jthielen opened 4 years ago

jthielen commented 4 years ago

As brought up in #3643, there appears to be some inconsistencies in how xarray handles other numeric/duck array types with regards to a well-defined type casting hierarchy across operations. For example, in the following:

Construction/Wrapping

Binary Ops

(would be one less category to worry about if refactored to use __array_ufunc__, see https://github.com/pydata/xarray/pull/3936#issuecomment-610516784)

__array_ufunc__

__array_function__

One concrete example of where this has been problematic is with xarray DataArrays and Pint Quantities (#3643). xarray DataArray is above Pint Quantity in the (generally agreed upon) type casting hierarchy, and wrapping and binary ops work properly since Pint Quantities defer and xarray DataArrays handle the operation. However, ufuncs fail because they both attempt to defer to the other. Having a consistent way of handling type compatibility across all relevant areas in xarray should be able to remove these kinds of issues.

However, it would be good to keep in mind that an agreed upon way of how to do this in the broader ecosystem doesn't seem to be there yet, so this would still be treading in uncertain waters for the moment. I've been operating under these assumptions when working with Pint, but I definitely think there is a need for more authoritative guidance.

Also, if I'm mistaken in any of the things mentioned above, please do let me know!

cc @keewis, @shoyer

max-sixty commented 2 months ago

Is this still current?

jthielen commented 2 months ago

Is this still current?

I think both yes and no? Since the big series of discussions back in 2021, I don't think much work ended up happening on cross-ecosystem compatibility with nested arrays specifically, so I would assume many of these issues (particularly with using numpy ufuncs and array functions) still remain. However, a lot of progress on the Array API has happened, so those issues may no longer be a priority, given that the current expectation seems to instead be just using the Array API of the top-level library, rather than having the NumPy APIs handle it all. So, as long as xarray (and more generally, each higher-level library) handles construction/wrapping consistently with Array API behaviors, all should be well?

So, for this issue in particular, my hunch would be to keep it around for now and then revisit once https://github.com/pydata/xarray/issues/7848 (and perhaps also other libraries' efforts like https://github.com/hgrecco/pint/issues/1592) are resolved. But, my focus has been diverted away from these efforts for the past several years, so I'd gladly defer to folks who have kept up expertise in this area.