larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Boolean filters with axes labels which are a subset of array axes labels is silently broken #1085

Closed gdementen closed 2 months ago

gdementen commented 12 months ago
>>> arr = ndtest((2, 4))
>>> f = arr > 2
a\b     b0     b1     b2    b3
 a0  False  False  False  True
 a1   True   True   True  True
>>> arr[f]
a_b  a0_b3  a1_b0  a1_b1  a1_b2  a1_b3
         3      4      5      6      7
>>> f2 = f['b0,b2,b3']
>>> f2
a\b     b0     b2    b3
 a0  False  False  True
 a1   True   True  True
>>> arr[f2]
a_b\b  b3  b0  b2  b3
a0_b3   3   0   2   3
a1_b0   7   4   6   7
a1_b2   7   4   6   7
a1_b3   7   4   6   7

This is complete junk.

I see three options going forward:

  1. raise an error when common axes between filter.axes and array.axes are not equal
  2. behave as if the filter was False where not present (possibly check that filter.axes are subsets of array.axes). I think that numpy previously had the equivalent of this behavior but no longer support this.
  3. align filter.axes with array.axes, so that if filter.axes has more axes (unsure what happens in this case currently) or more labels on common axes, the result has more labels too.

Currently, I think it would be best to implement 1, until we implement align by default for all operations, in which case 3. would make more sense. I might revise my judgment on option 2 if it turns out absolutely necessary to solve #1084.

FWIW, I don't think this is worth blocker priority even though this is a "silent" failure because the extra "combined" axis would be very quickly spotted by users.

gdementen commented 2 weeks ago

FWIW, for __setitem__, the picture was a bit different: it somehow/somewhat worked: missing keys were considered False and extra keys were ignored as long as the filter was False for them

>>> arr[f2] = 99
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5  99  99
>>> f3 = f2.append('b', True, label='b42')
>>> f3
a\b     b0     b2     b3   b42
 a0  False  False  False  True
 a1  False   True   True  True
>>> arr[f3]
ValueError: b['b42', 'b2', 'b3', 'b42'] is not a valid label for any axis
>>> f3 = f2.append('b', False, label='b42')
>>> f3
a\b     b0     b2     b3    b42
 a0  False  False  False  False
 a1  False   True   True  False
>>> arr[f3] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5  42  42

but the included fix (to raise on incompatible axes) is still a good thing IMO