xarray-contrib / pint-xarray

Interface for using pint with xarray, providing convenience accessors
https://pint-xarray.readthedocs.io/en/latest/
Apache License 2.0
101 stars 12 forks source link

Integration with Uncertainties #3

Open TomNicholas opened 4 years ago

TomNicholas commented 4 years ago

It would be absolutely great to be able to propagate unit-aware arrays with uncertainties through xarray, but it's unclear to me to what extent pint is currently integrated with the Uncertainties package.

I also haven't thought much about whether this would present any additional challenges on the xarray side.

jthielen commented 4 years ago

Right now, Pint does not simultaneously support NumPy and Uncertainties, so this integration would likely have to be on hold until they would become mutually supported.

xref https://github.com/pydata/xarray/issues/3509

miketynes commented 2 years ago

@jthielen Do you happen to know if there are any plans to support integrating both packages at once?

jthielen commented 2 years ago

As far as I'm aware, it's a desired feature but has no roadmap or timeline to implementation. Though, I'd encourage you to ask directly on the Pint issue, as the Pint maintainers themselves may have a better idea!

miketynes commented 2 years ago

Thanks for your response. Somehow I didn't notice that this issue was actually on a different repo!

varchasgopalaswamy commented 1 year ago

With regards to uncertainties, I think one difficulty with the existing uncertainties package is that you have to hand-write the propagation rules for each operation, which is definitely not feasible for the entire numpy library. If you used an auto-differentiation library like JAX, you could imagine trying to have that implement __array_function__ for all numpy functions. I tried, and was a little successful with getting something like this working here, but there's a ton of issues with the implementation that I don't think can be resolved with the hacky way I implemented it.

Probably the best way to do it would be to re-implement it in xarray. Would there be any interest in this? I would definitely be interesting in contributing towards getting something like this to work.

TomNicholas commented 1 year ago

@varchasgopalaswamy if you can get uncertainty support working in xarray in a general and interoperable way, people would love that! I would expect it to be a pretty big project though.

re-implement it in xarray

Can you expand on what you mean by this? Making uncertainties work in a way that doesn't prevent you from using xarray as your top-level object, and also using different array types underneath (e.g. dask, cupy), would be critical for widespread adoption.

There are also some discussions about potentially wrapping Jax in xarray ( I can find links if you want).

varchasgopalaswamy commented 1 year ago

Agreed, I don't expect it to be easy - but I would find it useful for my research. We're thinking of adopting xarray throughout our code, which requires support for uncertainty propagation, so I have a vested interest in getting this to work!

Can you expand on what you mean by this?

I am still new to xarray and I need to get more familiar with the data model and terminology, so I might not be saying this right.

Looking through this repo, it looks like how pint integration is being done is by either converting the value stored in a DataArray to a Quantity, or by having a "units" entry in attrs. Let's say you have two DataArrays that you add together. How does pint enter the picture here? Do you quantify the dataset values, add the two Quantity arrays together, and then create a new DataArray with the resulting Quantity array?

TomNicholas commented 1 year ago

We're thinking of adopting xarray throughout our code, which requires support for uncertainty propagation, so I have a vested interest in getting this to work!

This is how every cool feature gets made :)

Looking through this repo, it looks like how pint integration is being done is by either converting the value stored in a DataArray to a Quantity, or by having a "units" entry in attrs.

Xarray objects are wrappers of numpy arrays, or of arrays that can be treated as if they were numpy arrays (so-called "duck-typed arrays"). Xarray expects that the wrapped array exposes certain methods and attributes (e.g. .shape, .mean(), __add__()), which act in the same way. The rules for what exactly counts as a numpy-like array has recently become a lot more formalized (thankfully) via the python array API standard.

Xarray organises and aligns the wrapped data, but the actual computation is always performed by the underlying array type. So da + da will ultimately call ndarray.__add__ if each da is wrapping a numpy array. You can see what particular array object the DataArray is wrapping by calling .data.

So in the case of pint, xarray is wrapping a pint.Quantity, and all operations are delegated to the pint.Quantity object. (pint.Quantity happens to further wrap numpy.ndarray, but this isn't a requirement from xarray's perspective, it's just an implementation detail of pint.)

Let's say you have two DataArrays that you add together. How does pint enter the picture here? Do you quantify the dataset values, add the two Quantity arrays together, and then create a new DataArray with the resulting Quantity array?

Basically yes, but if you start with a Dataset with quantified values, the other steps are handled automatically by xarray delegating to the wrapped pint.Quantity objects.

EDIT with some links:

This issue might interest you - it's about wrapping ragged-length arrays with xarray, but it talks about the API requirements in more detail. https://github.com/pydata/xarray/issues/4285

Comment about JAX in xarray https://github.com/pydata/xarray/issues/3232#issuecomment-522820303

https://github.com/google/jax/issues/1565

Project board for xarray wrapping all the arrays