xarray-contrib / pint-xarray

Interface for using pint with xarray, providing convenience accessors
https://pint-xarray.readthedocs.io/en/latest/
Apache License 2.0
105 stars 12 forks source link

API? #6

Open TomNicholas opened 4 years ago

TomNicholas commented 4 years ago

@jthielen proposed a rough accessor API in pint/#849, to which I've added a couple of things:

DataArray:

Dataset:

(this may be modified as things change on xarray's and pint's end, especially involving Dask arrays (xref #883))

Anything else?

At some point when the integration is more solidified (but before official release) we should change the accessor from pint to units, to get a interface more like what's described here. This would be: a) More intuitive b) Units-library agnostic c) A good fit for potentially using an entrypoint to choose which units library you want to use. There's already an entrypoint for plotting backends in xarray, and plans to add one for storage backends too.

jthielen commented 4 years ago

One additional thing I've realized is needed since writing https://github.com/hgrecco/pint/issues/849#issuecomment-579992247 is a helper for coordinate unit conversion. Not sure what the best name for this would be though. In MetPy, the tentative name is .convert_coordinate_units, but that really only makes sense since the MetPy accessor's version of .to is .convert_units. Maybe .coordinate_to?

See https://github.com/Unidata/MetPy/pull/1325 / https://github.com/Unidata/MetPy/compare/v0.12.0...jthielen:0-12-patch-xarray-0-15-1?expand=1

keewis commented 4 years ago

we could also provide work-arounds for operations like rolling or ffill that, due the functions and libraries they use (i.e. bottleneck, numbagg, scipy, numpy.lib, etc., but also numpy.vectorize), won't be able to support duck arrays in the near future (though we should probably wait until there is a final decision on that, right now these methods were simply postponed).

TomNicholas commented 4 years ago

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Also what about UnitRegistry().wraps? Do we need a xarray equivalent for that? Would that help with providing workarounds for rolling etc?

jthielen commented 4 years ago

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Not quite, it would be separate. I'm thinking of something to change the units on a coordinate of da without changing anything else on da, which needs to follow different logic given that Indexes are immutable, and, for now, not supporting units directly...so something like

def coord_to(self, coord, units):
    new_coord_var = self.da[coord].copy(
        data=Quantity(self.da[coord].values, self.da[coord].attrs.get('units')).m_as(units)
    )
    new_coord_var.attrs['units'] = str(units)
    return self.da.assign_coords(coords={coord: new_coord_var})
keewis commented 4 years ago

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

Obviously, we need something really similar for Dataset.pint.to, so we could implement it by using _to_temp_dataset and to_dataarray.

jthielen commented 4 years ago

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

So long as we don't lose the easy ability to just convert data units (e.g., da.pint.to('degC')), that sounds great.

TomNicholas commented 4 years ago

So as a to-do list, that would mean one positional arg for the data and multiple keyword args for the coords in da.pint.to(), then all keyword args for ds.pint.to():

keewis commented 4 years ago

Agreed for points 1 (DataArray only?) and 3 (Dataset only). For 2 there might be a few more options (nothing wrong with this one), but I think we should discuss them in a separate issue / when we implement it to keep this issue as a high level API overview.

keewis commented 4 years ago

something else that is not a part of the API but in scope for this package (I think?): maybe we could monkeypatch the repr of DataArray, Variable and Dataset (or register some sort of repr function for pint arrays (via entrypoints? on import?)) to make it more readable (i.e. make sure the unit is always visible). See also pydata/xarray#2773

TomNicholas commented 4 years ago

I definitely think that's in scope for this package, at least until some more general solution for reprs of all wrapped arrays is implemented upstream.

Monkey-patching actually seems like a reasonable temporary solution here I think? The worst that can happen is that some other method which uses the repr fails to display the units right

jthielen commented 4 years ago

I definitely agree that some kind of repr fix is in order. The current default is...less than ideal...

keewis commented 4 years ago

from #11:

keewis commented 4 years ago

if we can get the reprs to work (pydata/xarray#2773, hgrecco/pint#1133), improve the documentation and maybe also expose the pint_xarray.testing module, we should be close enough to release a initial development version (0.1?)

keewis commented 4 years ago

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

jthielen commented 4 years ago

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

Sounds great! Right now though it looks like the only placeholders are sel and loc. If you're good with waiting a few days, I should be able to get around to adding those in time for 0.1...I should be able to implement them the same way I did in MetPy.

keewis commented 4 years ago

I was thinking of plus_minus, to_base_units and to_system, which don't have docstrings, work only on the data or raise NotImplementedError.

Edit: let's continue this in #25

keewis commented 3 years ago

from #61: try to dequantify automatically before plotting (see xarray.plot.plot.label_from_attrs)