hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.38k stars 466 forks source link

Use xarray objects in Contexts #1099

Open crusaderky opened 4 years ago

crusaderky commented 4 years ago

I have xarray.DataArray objects that wrap Quantity objects with more or less arbitrary dimensions, e.g. time or Monte Carlo shock ID. I find myself with vectorized physical specs - e.g. density - with other, also arbitrary, dimensions, e.g. shipment ID.

I would like to write something like this:

>>> volume = DataArray(Quantity([100, 101, 102], "liter"), dims=["montecarlo_sample"])
>>> density = DataArray(Quantity([1.1, 1.0], "kg/liter"), dims=["shipment"])
>>> ctx = Context()
>>> ctx.add_transformation("[volume]", "[mass]", lambda ureg, q: q * density)
>>> a.to_units("kg")
# a 3x2 DataArray with dimensions=["montecarlo_sample", "shipment"]

There are three problems:

  1. there is no DataArray.to_units method; it is less trivial to implement than one would believe though, because of point 2
  2. the way DataArray usually wraps methods of DataArray.data is to call the method on the data, then wrap the output in a new DataArray. Except that in this case it wouldn't work, because you can't multiply the bare numpy array [100, 101, 102] with the DataArray([1.1, 1.0])
  3. converting a scalar Quantity with the above context fails:
    >>> ureg.Quantity(1, "liter").to("kg", ctx)
    AttributeError: 'DataArray' object has no attribute '_magnitude'

    which raises the question of what kind of API pint should require from an arbitrary other library wrapping around pint.

Ideas/suggestions? cc @jthielen @keewis @hgrecco @shoyer

Workarounds

  1. Don't use pint.to and just multiply the two DataArrays. This is not so simple when writing a framework where the units are fully user-defined and more conversion steps can be chained together, e.g. convert m^3 to pounds

  2. A very painful walkabout:

    >>> vb, db = xarray.broadcast(volume, density)
    >>> ctx = Context()
    >>> ctx.add_transformation("[volume]", "[mass]", lambda ureg, q: q * db.data)
    >>> out = vb.data.to("kg", ctx)
    >>> xarray.DataArray(out, dims=vb.dims, coords=vb.coords)
    <xarray.DataArray (montecarlo_sample: 3, shipment: 2)>
    <Quantity([[110.  100. ]
    [111.1 101. ]
    [112.2 102. ]], 'kilogram')>
    Dimensions without coordinates: montecarlo_sample, shipment

    Note how in this case the context is throwaway, which slows things down because it can't rely on pint's caching.

hgrecco commented 4 years ago

which raises the question of what kind of API pint should require from an arbitrary other library wrapping around pint.

I think this is indeed the point we need to address. With more libraries interacting with pint in a non-trivial way (i.e. libraries wrapping pint, not just using) we need to define a crystal clear interface. But we also need to revist the arguments of the methods that are exposed. Your case is a good example.

keewis commented 4 years ago

with only pint and xarray, I think

vb.copy(data=vb.data.to("kg", ctx))

is the best we can do right now. In the future, pint-xarray should be able to simplify that to

vb.pint.to("kg", ctx)

I don't know anything about contexts, but I think for this to work we need q in

ctx.add_transformation("[volume]", "[mass]", lambda ureg, q: q * density)

to somehow stay a xarray object.