Open jthielen opened 2 years ago
I fully agree that we can move forward. One nice thing about the current facets organization is that we can create a facets/new_numpy/
facet and test it side by side.
Is there a good testsuite for compliance with the API that we can use?
Is there a good testsuite for compliance with the API that we can use?
The test suite at https://github.com/data-apis/array-api-tests is very thorough, and easy to set up on CI too.
The work I've been doing on uncertainties might be good grist for this mill.
xref https://github.com/hgrecco/pint/issues/1611, https://github.com/hgrecco/pint/pull/1615 xref https://github.com/hgrecco/pint-pandas/issues/139 https://github.com/hgrecco/pint-pandas/pull/140
I would really like to see pint conform to the array API standard. That effort is really being broadly adopted, and helps out wrapper libraries like xarray immensely. It would particularly help fix this frustrating issue with pint-xarray
having to set force_ndarray_like = True
, see https://github.com/xarray-contrib/pint-xarray/issues/216.
I would like to move this forward. Should we put schedule it for 0.24? We need PR, tests and a good deprecation policy if some behavior needs to go away. See also #1895
As mentioned above, we've been discussing this a bit in the pint-xarray
issue above. In short, xarray
uses a few numpy
protocols and the array properties dtype
, shape
, and ndim
to detect and cast scalars to numpy
arrays. However, pint
defines all of them but will raise / return None
/ return NotImplemented
if it actually wraps a scalar. This is means that hasattr
checks don't work (though I just earlier learned that hasattr
is basically
def hasattr(obj, attr):
try:
getattr(obj, attr)
except Exception:
return False
else:
return True
which means that properties are always executed, and the checks could become getattr(obj, attr, NotImplemented) is not NotImplemented
or something similar without significant additional cost).
One option to resolve this would be to split into QuantityScalar
and QuantityArray
, but I still think this is not a good idea, since predicting whether a return value is a scalar or an array will not be easy.
A different option, which I think would be much easier to deal with, is to have separate modes of the registry: a scalar mode, where any function takes a scalar and returns a scalar, and a array mode that follows the array API and takes arrays and returns arrays (and scalars should be passed through pint.Quantity.__array_namespace__().asarray()
, as the array API does not allow interaction with scalars).
Unlike force_ndarray_like=True
– which is a registry-global option – this would allow using the same registry in scalar workflows and array workflows without interfering with each other.
One option to resolve this would be to split into
QuantityScalar
andQuantityArray
, but I still think this is not a good idea, since predicting whether a return value is a scalar or an array will not be easy.
I don't see why that would be a problem. If you are a QuantityScalar
, you are low in the array hierarchy, and you can safely assume all operations return scalars too - otherwise, a higher object would have been responsible. On the other hand, if you are QuantityArray
, you will be delegating most operations to an array library handling the magnitude, and carry some units information through a side channel. Then, you simply inspect the return value you got from that array library whether it is a scalar or not. Performance wise, that is O(1), just like the unit operations are.
On the other hand, there are other reasons where a separation would be helpful, see https://github.com/hgrecco/pint/issues/1128#issuecomment-979432968 (i.e scalars should not conform to the array API).
While Pint was a reasonably early adopter of NEP 18 (
__array_function__
) for array type compatibility, there have been several compatibility efforts in the NumPy (and Python arrays more generally) ecosystem recently that Pint has not yet taken advantage of, particularly the open NEPs 37 and 47, and with those, the Python Array API standards. It would be wonderful to see these implemented in Pint, and I believe that it should not be too onerous of a task (at least compared to the initial NEP 18 implementation)...perhaps all that would be needed is exposing the registries of functions in https://github.com/hgrecco/pint/blob/master/pint/facets/numpy/numpy_func.py as a module, which could then be returned as appropriate in the__array_module__
and__array_namespace__
protocols? That being said, given that__array_namespace__
denotes compliance with the Python Array API, perhaps rigorous testing against the API standard should be done prior to implementing that?xref https://github.com/pydata/xarray/pull/7067, https://github.com/pydata/duck-array-discussion/issues/3