Open max-sixty opened 1 year ago
I completely agree. I think part of the problem is that because we generally have good internal abstractions for operations, an error gets thrown from deep within e.g. align
when the user asked to do something much more high-level (like open_mfdataset
). We have a common issue where little or no context is given about which high-level variable or operation caused the low-level routine to raise.
because we generally have good internal abstractions for operations, an error gets thrown from deep within e.g.
align
when the user asked to do something much more high-level (likeopen_mfdataset
). We have a common issue where little or no context is given about which high-level variable or operation caused the low-level routine to raise.
Yes I definitely agree, this can be an issue, #2078 is a great example there.
(that said, I still think there are areas we can do better without big changes — even if we gave the index of the object which failed it would be quite helpful...)
I've been using pandas a bit recently, and it has even worse error messages (though also the objects are less complex, so maybe less impactful). Here's an example:
We're a bit better at the comparable case:
As one case of doing this well: there's useful info we could provide when there's an index error:
@TomNicholas makes a good point that many of the errors are raised deeper in the call stack which doesn't have the broader context, such as a view of the full object. One option is for higher-level code to catch the exception, and call a function with the full context which is designed to write a good error message. This also means that it's OK to have lower performance requirements — we only pay the cost of constructing the message (including, for example, searching for similar values in an index) when we hit an error. There could also be an option for skipping this in an environment where perf is more important than friendliness.
I would love for xarray to be the equivalent of the rust compiler here — known for being surprisingly helpful, such that going back to another tool feels like you're without a teammate!
This might not seem like a traditional funding proposal, but I do think it could make a good candidate for one:
I agree with everything you just wrote @max-sixty :pray:
Another thing to keep in mind would be the support for exception groups in python 3.11. I imagine there could be a lot of use cases for these in xarray, and @keewis and I have already been discussing using them in pint-xarray (https://github.com/xarray-contrib/pint-xarray/issues/144#issuecomment-1776029618) and in datatree (https://github.com/xarray-contrib/datatree/pull/264).
Is your feature request related to a problem?
Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline.
Some of the error messages could be much more helpful. Take one example:
The second sentence is nice. But the first could be give us much more information:
join=...
? Are they off by 1 or are they completely different types?testing.assert_equal
produces pretty nice errors, as a comparisonHaving these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library.
Describe the solution you'd like
I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date.
One thing we do in PRQL is have a file that snapshots error messages
test_bad_error_messages.rs
, which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter,pytest-regtest
is the best I've found; I wrotepytest-accept
but requires doctests).Any other ideas?
Describe alternatives you've considered
No response
Additional context
A couple of specific error-message issues: