Open p3trus opened 9 years ago
Oh, okay, having the fallback like that was how I thought about implementing it. (I'm sorry that I didn't describe that in my initial comment.)
So would the way forward be to implement DataArray.units_convert()
/DataArray.units_to()
and DataArray.units
as you described right now, but wait for and/or delegate IO integration? Or, should there also be a fully-backwards-compatible IO integration implemented now (such as an optional kwarg on open_dataset
and to_netcdf
)?
That said, I agree that the dedicated accessor is a pretty good interface, especially if you want more methods/property than units
and to
. Even then array.pint.units
and array.pint.to()
is pretty readable. This could make sense if we are concerned about different interfaces for different unit libraries.
The accessor route is also definitely more conservative than putting first-class unit support in xarray proper, which I like.
As for where to put it, that's sort of up to you. I think it's probably going to get complicated enough that it should into a library that can be installed, rather than being a boilerplate example in xarray's docs. It could be in xarray if it's going to be very minimal (e.g., only one method + one property) but if you want more than that -- and judging by pint-pandas I would guess you do -- then it should probably go into pint or a dedicated package, e.g., pint-xarray.
I would not write a general purpose xarray+units library unless you're particularly excited about doing that for some reason -- it's much better to start with a particular integration that works well than to make a half-hearted effort at something general purpose without well-motivated use-cases.
@shoyer I agree, the accessor interface makes a lot of sense for this: it's more conservative on the xarray side, while also giving the most flexibility for the pint + xarray integration.
Based on your feedback and what I'd hope to see out of the pint + xarray integration, I'm thinking a pint-adjacent package like pint-xarray may be the best route forward. I'll create an issue on pint to inquire about that possibility. See https://github.com/hgrecco/pint/issues/849.
The other virtue of the separate package is a faster release cycle. We can (and should!) still put a full example in the xarray docs.
For IO integration, I think the simplest choice would be to write utility functions for going to pint arrays from unit-free arrays with string “units” in attrs (and back). That will be easily composable with xarray’s existing IO functionality.
Based the points raised by @crusaderky in https://github.com/hgrecco/pint/issues/878#issue-492678605 about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does.
Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a "dask-array-like" way rather than just an "array-like" way. Then, instead of xarray checking for isinstance(dask_array_type)
, it could for check for "duck dask arrays" (e.g., those with both __array_function__
and __dask_graph__
)? There are almost certainly some subtle implementation details that would need to be worked out, but I'm guessing that this could take care of the bulk of the integration.
Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know!
Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? ... Then, instead of xarray checking for
isinstance(dask_array_type)
, it could for check for "duck dask arrays" (e.g., those with both__array_function__
and__dask_graph__
)?
Yes, great idea!
Hi all, I was just pointed at this by someone who went to the NumFOCUS summit. I'm the lead developer for unyt
, which I see has come up a little bit. If anyone wants to chat with me about adding support for unyt
along with Pint
I'd be happy to discuss.
sorry for the late reply, @ngoldbaum. For xarray
to support unyt
, it would have to implement NEP-18 (i.e. __array_function__
) which I think it does not yet? I would expect to have unyt
support come for free once pint
support works so I would focus on that first (see #3594 for the current progress). Extending the tests to also test unyt
would be possible but I'm thinking it would be better to put that into the accessor package (as discussed above for a possible pint
accessor)?
There were a few unresolved design questions with how unit libraries should implement certain numpy
functions (e.g. how where
should behave when receiving array(nan)
and array(0)
which xarray
uses to implement nanops, or which unit the result of full_like
should have): see hgrecco/pint#905. Any supported unit library would have to behave the same way so I think it would be good to coordinate that. Thoughts?
Hi! I'm just popping in as a very interested user of both xarray and unit packages to ask: since there's been some awesome progress made here and pint-xarray is now enough of A Thing to have documentation, though obviously experimental - how much work would you expect a corresponding package for astropy's Quantities to take, given the current state of things?
Are there any limitations that would prevent that? I saw the discussion above about Quantities being more problematic due to taking the subclass-from-numpy-arrays route, but I'm not sure how much of a roadblock that still is. I would suspect the API could be shared with pint-xarray (which, obviously, is experimental for now).
I saw the discussion above about Quantities being more problematic
I would expect astropy
quantities to work just fine as long as they are duck arrays. Also, the only limitation I would expect them to have is that they can't "wrap" anything other than numpy
arrays. dask
arrays are an exception since they can wrap quantities using the _meta
attribute (which is something we try to avoid with pint
, though).
For reference, the currently remaining issues for all duck arrays (except obviously dask
) are documented here
I would expect astropy quantities to work just fine as long as they are duck arrays
actually, that turns out to be wrong. Since isinstance(data, np.ndarray)
returns true for astropy.units.Quantity
, it is cast to ndarray
using np.asarray
: https://github.com/pydata/xarray/blob/7dbbdcafed7f796ab77039ff797bcd31d9185903/xarray/core/variable.py#L231-L245
Adding ~or issubclass(type(data), np.ndarray)
~ or type(data) != np.ndarray
does allow wrapping a astropy.units
quantity in Dataset
/ DataArray
objects ~but it breaks a few tests~. Also, unless we modify the testsuite in xarray/tests/test_units.py
to run with astropy.units
instead of pint
I can't really tell which features of xarray
strip the units (in addition to the ones documented for pint
). For that, we probably need to somehow create a generalization of the tests for duck arrays.
I believe this can be closed via https://github.com/pydata/xarray/pull/5734 @andersy005?
@keewis it looks like adding an if block before line 323 would allow astropy quantities to be wrapped as you said, i.e.,
if isinstance(data, u.Quantity):
return cast("T_DuckArray", data)
what else is needed to make this happen? I'd love to make some progress towards xarray
wrapping astropy.units
. Should #7799 be picked up again since @dstansby can't work on it anymore?
We've since made some progress with the automatic generation of tests for arbitrary duck arrays in xarray-array-testing, so it might be better to work on that instead of trying to generalize the massive (and probably stale) testsuite for testing unit support with pint
.
That said, that's for testing the support, not the support itself. Implementing that should be fairly easy, we should just make sure to continue banning numpy.matrix
and numpy.ma.masked_array
(I think those were the ones we explicitly wanted to convert to numpy.ndarray
in #2956). Either way, feel free to open a dedicated issue to track astropy.units
support.
Edit: actually, numpy.ma.MaskedArray
is special-cased immediately above the block for duck arrays, so I believe the only thing we have to make sure is to not pass through numpy.matrix
instances.
One thing that bother me alot is that pandas lacks good unit support (e.g. quantities, pint, ...)
Is there a chance that xray will support it?