larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Improve getitem ValueError message when trying to retrieve multiple labels #1101

Closed gdementen closed 2 months ago

gdementen commented 6 months ago

When trying to retrieve several labels and it fails, it is sometimes hard to know exactly which label(s) are missing from the array, especially when the axes are long and not all labels are displayed in the axes summary.

I think that when it fails, displaying axes where it matches partially and the missing keys could be a large quality of life improvement:

>>> arr = la.ndtest('a=a0..a4,a6..a9;b=b0..b2')
>>> arr['a0,a7,a2,a5,a3,a4,a8']
ValueError: 'a0,a7,a2,a5,a3,a4,a8' is not a valid label for any axis:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'

FWIW, as a quick workaround, I had to resort to this to know which key was missing:

>>> arr.a.union('a0,a7,a2,a5,a3,a4,a8').difference(arr.a)
Axis(['a5'], 'a')
gdementen commented 3 months ago

I have code in my local git repository to change the error to:

ValueError: 'a0,a7,a2,a5,a3,a4,a8' is not a valid label for any axis:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
but there are axes with partial matches:
 * axis 'a' contains 6 out of 7 key labels (missing labels: a5)

but I wonder if directly mentioning a5 wouldn't be better. Something like:

ValueError: 'a0,a7,a2,a5,a3,a4,a8' is not a valid subset because some labels are not found on any axis: 'a5'
Array axes:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
The subset key partially matches some axes though:
 * axis 'a' contains 6 out of 7 key labels (missing labels: 'a5')

@alixdamman : what do you think?

gdementen commented 3 months ago

pushing to version 0.35 because I am unsure what information we should provide.

We should make sure the error in the following case is clear too:

current

>>> arr['a1,b1']
ValueError: 'a1,b1' is not a valid label for any axis:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'

git branch

ValueError: 'a1,b1' is not a valid label for any axis:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
but there are axes with partial matches:
 * axis 'a' contains 1 out of 2 key labels (missing labels: b1)
 * axis 'b' contains 1 out of 2 key labels (missing labels: a1)

The proposition above does NOT work in that case. I guess that in that case I could omit the "because some labels are not found on any axis:" part.

ValueError: 'a1,b1' is not a valid subset for any axis
Array axes:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
The subset key partially matches some axes though:
 * axis 'a' contains 1 out of 2 key labels (missing labels: 'b1')
 * axis 'b' contains 1 out of 2 key labels (missing labels: 'a1')
alixdamman commented 2 months ago

I think there is no need to repeat information. IMO, you can drop because some labels are not found on any axis in any case.

So you would have:

ValueError: 'a0,a7,a2,a5,a3,a4,a8' is not a valid subset because some labels are not found on any axis: 'a5'
Array axes:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
The subset key partially matches some axes though:
 * axis 'a' contains 6 out of 7 key labels (missing labels: 'a5')

and

ValueError: 'a1,b1' is not a valid subset for any axis
Array axes:
 a [9]: 'a0' 'a1' 'a2' ... 'a7' 'a8' 'a9'
 b [3]: 'b0' 'b1' 'b2'
The subset key partially matches some axes though:
 * axis 'a' contains 1 out of 2 key labels (missing labels: 'b1')
 * axis 'b' contains 1 out of 2 key labels (missing labels: 'a1')
gdementen commented 2 months ago

I don't understand what you mean. It seems to me like the examples you give are the opposite of the description you gave. In any case, I implemented my first version (and included in 0.34.3 -- otherwise it would have been one more improvement done in my local copy users would not have benefited from), only changing the wording but not adding the extra bit. This can come later if we deem it necessary.