xarray-contrib / pint-xarray

Interface for using pint with xarray, providing convenience accessors
https://pint-xarray.readthedocs.io/en/latest/
Apache License 2.0
105 stars 12 forks source link

Raise pint.DimensionalityError instead of ValueError #144

Open TomNicholas opened 3 years ago

TomNicholas commented 3 years ago

If you try to convert a pint.Quantity to an incompatible unit you get a pint.DimensionalityError, whereas if you try to convert a quantified DataArray to an incompatible unit you get a ValueError. These should give the same type of error.

In [1]: import pint_xarray

In [2]: import xarray as xr

In [3]: import pint

In [4]: mass_q = pint.Quantity(10, units='kg')

In [5]: mass_q.to('seconds')
---------------------------------------------------------------------------
DimensionalityError                       Traceback (most recent call last)
<ipython-input-5-86c449341523> in <module>
----> 1 mass_q.to('seconds')

~/Documents/Work/Code/pint/pint/quantity.py in to(self, other, *contexts, **ctx_kwargs)
    722         other = to_units_container(other, self._REGISTRY)
    723 
--> 724         magnitude = self._convert_magnitude_not_inplace(other, *contexts, **ctx_kwargs)
    725 
    726         return self.__class__(magnitude, other)

~/Documents/Work/Code/pint/pint/quantity.py in _convert_magnitude_not_inplace(self, other, *contexts, **ctx_kwargs)
    671                 return self._REGISTRY.convert(self._magnitude, self._units, other)
    672 
--> 673         return self._REGISTRY.convert(self._magnitude, self._units, other)
    674 
    675     def _convert_magnitude(self, other, *contexts, **ctx_kwargs):

~/Documents/Work/Code/pint/pint/registry.py in convert(self, value, src, dst, inplace)
   1001             return value
   1002 
-> 1003         return self._convert(value, src, dst, inplace)
   1004 
   1005     def _convert(self, value, src, dst, inplace=False, check_dimensionality=True):

~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace)
   1915                 value, src = src._magnitude, src._units
   1916 
-> 1917         return super()._convert(value, src, dst, inplace)
   1918 
   1919     def _get_compatible_units(self, input_units, group_or_system):

~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace)
   1516 
   1517         if not (src_offset_unit or dst_offset_unit):
-> 1518             return super()._convert(value, src, dst, inplace)
   1519 
   1520         src_dim = self._get_dimensionality(src)

~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace, check_dimensionality)
   1034             # then the conversion cannot be performed.
   1035             if src_dim != dst_dim:
-> 1036                 raise DimensionalityError(src, dst, src_dim, dst_dim)
   1037 
   1038         # Here src and dst have only multiplicative units left. Thus we can

DimensionalityError: Cannot convert from 'kilogram' ([mass]) to 'second' ([time])

In [6]: mass_da = xr.DataArray(10).pint.quantify(units='kg')

In [7]: mass_da.pint.to('seconds')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units(obj, units)
    231     try:
--> 232         new_obj = call_on_dataset(
    233             convert_units_dataset, obj, name=temporary_name, units=units

~/Documents/Work/Code/pint-xarray/pint_xarray/compat.py in call_on_dataset(func, obj, name, *args, **kwargs)
     12 
---> 13         result = func(ds, *args, **kwargs)
     14 

~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units_dataset(obj, units)
    216     if failed:
--> 217         raise ValueError(failed)
    218 

ValueError: {None: DimensionalityError()}

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-7-00816f2a573d> in <module>
----> 1 mass_da.pint.to('seconds')

~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in to(self, units, **unit_kwargs)
    568         units = either_dict_or_kwargs(units, unit_kwargs, "to")
    569 
--> 570         return conversion.convert_units(self.da, units)
    571 
    572     def chunk(self, chunks, name_prefix="xarray-", token=None, lock=False):

~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units(obj, units)
    238             failed[obj.name] = failed.pop(temporary_name)
    239 
--> 240         raise ValueError(format_error_message(failed, "convert")) from e
    241 
    242     return new_obj

ValueError: Cannot convert variables:
    incompatible units for variable None: Cannot convert from 'kilogram' ([mass]) to 'second' ([time])
TomNicholas commented 3 years ago

Looking at this further this is actually quite tricky. Currently the code will attempt to convert all variables in the xarray object, collect the errors, and then raise a ValueError with a list of what went wrong for each variable. That's clever, and useful for users to see all the mistakes they made in one go.

However even if only one error was raised, or all the errors that were raised were of the same type, then currently it will still raise a ValueError regardless if the actual errors were DimensionalityErrors or what.

I tried fixing that by changing convert_units_dataset to look like this

def convert_units_dataset(obj, units):
    converted = {}
    failed = {}
    for name, var in obj.variables.items():
        unit = units.get(name)
        try:
            converted[name] = convert_units_variable(var, unit)
        except (ValueError, pint.errors.PintTypeError) as e:
            failed[name] = e

    if failed:
        exception_types = [type(e) for e in failed.values()]
        if len(set(exception_types)) == 1:
            # If all failures were of the same type then raise exception of that type
            # Ensures simple cases of invalid unit conversion raise a pint.DimensionalityError
            raise exception_types[0](failed)
        else:
            raise ValueError(failed)

    return dataset_from_variables(converted, obj._coord_names, obj.attrs)

but that doesn't work because the init signature of DimensionalityError is different, so it will fail when trying to construct the DimensionalityError.

Fundamentally trying to combine multiple different errors into one is a bit weird, so what I suggest we do is raise the first error encountered, but with an longer message that can list any later errors found. That should hopefully be the best of both: specific error types for operations on one variable as you would expect, but the user can still see everything they messed up in one try. (And as dicts retaining insertion order is apparently guaranteed past python 3.7, the same error will definitely be raised everytime the conversion function is run.) What do you think @keewis?

keewis commented 3 years ago

I'm not particularly attached to having it raise ValueError, but there were a few issues I ran into that could be avoided that way. What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages (and I would like to avoid special-casing a single failure vs multiple failures). Maybe it's easier to change the error message to also include the type of the exception? We could also create a special exception that optionally accepts the failed dict so you can still have the raw exceptions and their tracebacks for debugging?

Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint error.

TomNicholas commented 3 years ago

Fundamentally I just feel quite strongly that errors caused by incorrect units should raise a pint unit-related error. It would be consistent with Quantity, it allows for catching unit-related errors specifically, and it draws a clearer line for users between "this failed because your code isn't valid" and "your code is valid but this failed because your science equations don't represent what you think they represent".

What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages

Is it not fairly simple to just raise the first error encountered? Especially if we know/expect that error is only going to be one of isinstance(err, [pint.Error, TypeError, ValueError]).

I would like to avoid special-casing a single failure vs multiple failures

True, but "Make a list, raise the first, display any remaining" is logic that is basically the same even for a single error.

Maybe it's easier to change the error message to also include the type of the exception?

We could also create a special exception that optionally accepts the failed dict so you can still have the raw exceptions and their tracebacks for debugging?

Its the actual exception type that I would like to be changed, not the message attached to the ValueError. Also I thought that message already did include the types anyway?

Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint error.

I'm not quite sure I understand what you mean by this.

keewis commented 2 years ago

I just saw PEP-654, which seems like it would resolve this (and much better than a single exception class could, if I understand correctly): we'd define a PintExceptionGroup (or a PintException that is a ExceptionGoup) and raise it instead of ValueError. There's even a backport library so we don't have to wait until we can drop py3.10.

Edit: maybe we don't even need the ExceptionGroup, but rather a exception that works like one. The advantage of having it be a ExceptionGroup is that there is a standardized way to access information, whereas having it be a custom exception would allow us to bypass the new except* syntax (which I didn't fully understand, so that might not be an issue)

TomNicholas commented 2 years ago

we'd define a PintExceptionGroup

That looks cool! If that allows us to raise both types of error then it seems like it could be a good solution.

keewis commented 2 years ago

There's also PEP 678 which allows attaching additional information to a exception. In our case, that would be the variable name and, in the case of quantify, the source of the units (attribute or parameter).

I did investigate using that, but because they are scheduled for python=3.11 there are a quite a few projects that don't support those yet, e.g. pytest, rich, or ipython, and thus jupyter. The result is that tracebacks include less information than right now (even though the information is there). I suppose we could add some special code that detects and mitigates that, but this seems like a lot of work so I'd rather wait for a bit.

What do you think about postponing this feature until there's a bit more support? That would most likely be after the release of python=3.11, but I guess we don't release that often, anyways.

TomNicholas commented 2 years ago

What do you think about postponing this feature until there's a bit more support?

Totally fine with postponing this!

TomNicholas commented 1 year ago

It seems like adding notes to exceptions is now pretty well supported:

However it doesn't yet seem to be supported by rich (https://github.com/Textualize/rich/issues/1859).

Perhaps we should we try using them now? Or wait for rich to support them?

(I realised after trying to use them in datatree https://github.com/xarray-contrib/datatree/pull/264)

keewis commented 1 year ago

yep, and there's even a backport package (we just can't use except* since that's new syntax). The only reason this is not implemented yet is that I couldn't find the time to figure out how to best add this without making the code extremely messy.