Open TomNicholas opened 3 years ago
Looking at this further this is actually quite tricky. Currently the code will attempt to convert all variables in the xarray object, collect the errors, and then raise a ValueError
with a list of what went wrong for each variable. That's clever, and useful for users to see all the mistakes they made in one go.
However even if only one error was raised, or all the errors that were raised were of the same type, then currently it will still raise a ValueError
regardless if the actual errors were DimensionalityError
s or what.
I tried fixing that by changing convert_units_dataset
to look like this
def convert_units_dataset(obj, units):
converted = {}
failed = {}
for name, var in obj.variables.items():
unit = units.get(name)
try:
converted[name] = convert_units_variable(var, unit)
except (ValueError, pint.errors.PintTypeError) as e:
failed[name] = e
if failed:
exception_types = [type(e) for e in failed.values()]
if len(set(exception_types)) == 1:
# If all failures were of the same type then raise exception of that type
# Ensures simple cases of invalid unit conversion raise a pint.DimensionalityError
raise exception_types[0](failed)
else:
raise ValueError(failed)
return dataset_from_variables(converted, obj._coord_names, obj.attrs)
but that doesn't work because the init signature of DimensionalityError
is different, so it will fail when trying to construct the DimensionalityError
.
Fundamentally trying to combine multiple different errors into one is a bit weird, so what I suggest we do is raise the first error encountered, but with an longer message that can list any later errors found. That should hopefully be the best of both: specific error types for operations on one variable as you would expect, but the user can still see everything they messed up in one try. (And as dict
s retaining insertion order is apparently guaranteed past python 3.7, the same error will definitely be raised everytime the conversion function is run.) What do you think @keewis?
I'm not particularly attached to having it raise ValueError
, but there were a few issues I ran into that could be avoided that way. What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages (and I would like to avoid special-casing a single failure vs multiple failures). Maybe it's easier to change the error message to also include the type of the exception? We could also create a special exception that optionally accepts the failed
dict so you can still have the raw exceptions and their tracebacks for debugging?
Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint
error.
Fundamentally I just feel quite strongly that errors caused by incorrect units should raise a pint unit-related error. It would be consistent with Quantity
, it allows for catching unit-related errors specifically, and it draws a clearer line for users between "this failed because your code isn't valid" and "your code is valid but this failed because your science equations don't represent what you think they represent".
What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages
Is it not fairly simple to just raise the first error encountered? Especially if we know/expect that error is only going to be one of isinstance(err, [pint.Error, TypeError, ValueError])
.
I would like to avoid special-casing a single failure vs multiple failures
True, but "Make a list, raise the first, display any remaining" is logic that is basically the same even for a single error.
Maybe it's easier to change the error message to also include the type of the exception?
We could also create a special exception that optionally accepts the failed dict so you can still have the raw exceptions and their tracebacks for debugging?
Its the actual exception type that I would like to be changed, not the message attached to the ValueError. Also I thought that message already did include the types anyway?
Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint error.
I'm not quite sure I understand what you mean by this.
I just saw PEP-654, which seems like it would resolve this (and much better than a single exception class could, if I understand correctly): we'd define a PintExceptionGroup
(or a PintException
that is a ExceptionGoup
) and raise it instead of ValueError
. There's even a backport library so we don't have to wait until we can drop py3.10
.
Edit: maybe we don't even need the ExceptionGroup
, but rather a exception that works like one. The advantage of having it be a ExceptionGroup
is that there is a standardized way to access information, whereas having it be a custom exception would allow us to bypass the new except*
syntax (which I didn't fully understand, so that might not be an issue)
we'd define a PintExceptionGroup
That looks cool! If that allows us to raise both types of error then it seems like it could be a good solution.
There's also PEP 678 which allows attaching additional information to a exception. In our case, that would be the variable name and, in the case of quantify
, the source of the units (attribute or parameter).
I did investigate using that, but because they are scheduled for python=3.11
there are a quite a few projects that don't support those yet, e.g. pytest
, rich
, or ipython
, and thus jupyter
. The result is that tracebacks include less information than right now (even though the information is there). I suppose we could add some special code that detects and mitigates that, but this seems like a lot of work so I'd rather wait for a bit.
What do you think about postponing this feature until there's a bit more support? That would most likely be after the release of python=3.11
, but I guess we don't release that often, anyways.
What do you think about postponing this feature until there's a bit more support?
Totally fine with postponing this!
It seems like adding notes to exceptions is now pretty well supported:
However it doesn't yet seem to be supported by rich
(https://github.com/Textualize/rich/issues/1859).
Perhaps we should we try using them now? Or wait for rich
to support them?
(I realised after trying to use them in datatree https://github.com/xarray-contrib/datatree/pull/264)
yep, and there's even a backport package (we just can't use except*
since that's new syntax). The only reason this is not implemented yet is that I couldn't find the time to figure out how to best add this without making the code extremely messy.
If you try to convert a
pint.Quantity
to an incompatible unit you get apint.DimensionalityError
, whereas if you try to convert a quantifiedDataArray
to an incompatible unit you get aValueError
. These should give the same type of error.