hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.4k stars 473 forks source link

Adopt NEP-37 and NEP-47 (array module and Python Array API) #1592

Open jthielen opened 2 years ago

jthielen commented 2 years ago

While Pint was a reasonably early adopter of NEP 18 (__array_function__) for array type compatibility, there have been several compatibility efforts in the NumPy (and Python arrays more generally) ecosystem recently that Pint has not yet taken advantage of, particularly the open NEPs 37 and 47, and with those, the Python Array API standards. It would be wonderful to see these implemented in Pint, and I believe that it should not be too onerous of a task (at least compared to the initial NEP 18 implementation)...perhaps all that would be needed is exposing the registries of functions in https://github.com/hgrecco/pint/blob/master/pint/facets/numpy/numpy_func.py as a module, which could then be returned as appropriate in the __array_module__ and __array_namespace__ protocols? That being said, given that __array_namespace__ denotes compliance with the Python Array API, perhaps rigorous testing against the API standard should be done prior to implementing that?

xref https://github.com/pydata/xarray/pull/7067, https://github.com/pydata/duck-array-discussion/issues/3

hgrecco commented 2 years ago

I fully agree that we can move forward. One nice thing about the current facets organization is that we can create a facets/new_numpy/ facet and test it side by side.

Is there a good testsuite for compliance with the API that we can use?

tomwhite commented 2 years ago

Is there a good testsuite for compliance with the API that we can use?

The test suite at https://github.com/data-apis/array-api-tests is very thorough, and easy to set up on CI too.

MichaelTiemannOSC commented 2 years ago

The work I've been doing on uncertainties might be good grist for this mill.

xref https://github.com/hgrecco/pint/issues/1611, https://github.com/hgrecco/pint/pull/1615 xref https://github.com/hgrecco/pint-pandas/issues/139 https://github.com/hgrecco/pint-pandas/pull/140

TomNicholas commented 11 months ago

I would really like to see pint conform to the array API standard. That effort is really being broadly adopted, and helps out wrapper libraries like xarray immensely. It would particularly help fix this frustrating issue with pint-xarray having to set force_ndarray_like = True, see https://github.com/xarray-contrib/pint-xarray/issues/216.

hgrecco commented 11 months ago

I would like to move this forward. Should we put schedule it for 0.24? We need PR, tests and a good deprecation policy if some behavior needs to go away. See also #1895

keewis commented 10 months ago

As mentioned above, we've been discussing this a bit in the pint-xarray issue above. In short, xarray uses a few numpy protocols and the array properties dtype, shape, and ndim to detect and cast scalars to numpy arrays. However, pint defines all of them but will raise / return None / return NotImplemented if it actually wraps a scalar. This is means that hasattr checks don't work (though I just earlier learned that hasattr is basically

def hasattr(obj, attr):
    try:
        getattr(obj, attr)
    except Exception:
        return False
    else:
        return True

which means that properties are always executed, and the checks could become getattr(obj, attr, NotImplemented) is not NotImplemented or something similar without significant additional cost).

One option to resolve this would be to split into QuantityScalar and QuantityArray, but I still think this is not a good idea, since predicting whether a return value is a scalar or an array will not be easy.

A different option, which I think would be much easier to deal with, is to have separate modes of the registry: a scalar mode, where any function takes a scalar and returns a scalar, and a array mode that follows the array API and takes arrays and returns arrays (and scalars should be passed through pint.Quantity.__array_namespace__().asarray(), as the array API does not allow interaction with scalars).

Unlike force_ndarray_like=True – which is a registry-global option – this would allow using the same registry in scalar workflows and array workflows without interfering with each other.

burnpanck commented 7 months ago

One option to resolve this would be to split into QuantityScalar and QuantityArray, but I still think this is not a good idea, since predicting whether a return value is a scalar or an array will not be easy.

I don't see why that would be a problem. If you are a QuantityScalar, you are low in the array hierarchy, and you can safely assume all operations return scalars too - otherwise, a higher object would have been responsible. On the other hand, if you are QuantityArray, you will be delegating most operations to an array library handling the magnitude, and carry some units information through a side channel. Then, you simply inspect the return value you got from that array library whether it is a scalar or not. Performance wise, that is O(1), just like the unit operations are.

On the other hand, there are other reasons where a separation would be helpful, see https://github.com/hgrecco/pint/issues/1128#issuecomment-979432968 (i.e scalars should not conform to the array API).