larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Combining label and boolean/value filter is super annoying #1084

Open gdementen opened 1 year ago

gdementen commented 1 year ago
>>> arr = ndtest((2, 4))
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5   6   7
>>> arr[arr > 2, 'b0,b2,b3']
ValueError: key has several values for axis: b
key: (a.i[a_b  a0_b3  a1_b0  a1_b1  a1_b2  a1_b3
         0      1      1      1      1], b.i[a_b  a0_b3  a1_b0  a1_b1  a1_b2  a1_b3
         3      0      1      2      3], 'b0,b2,b3')

The problem is that (contrary to #246), the boolean filter has the axis, of the label filter.

The workaround is to use boolean filters all the way:

>>> arr[(arr > 2) & ((arr.b == 'b0') | (arr.b == 'b2') | (arr.b == 'b3'))] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2  42
 a1  42   5  42  42

The question is whether we want to make it work out of the box:

>>> arr[arr > 2, 'b0,b2,b3'] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2  42
 a1  42   5  42  42

FWIW, for a single label, I find the workaround(s) is much more bearable:

>>> arr = ndtest((2, 4))
>>> arr[arr > 2, 'b0'] = 42
ValueError: key has several values for axis: b
key: (a.i[a_b  a0_b3  a1_b0  a1_b1  a1_b2  a1_b3
         0      1      1      1      1], b.i[a_b  a0_b3  a1_b0  a1_b1  a1_b2  a1_b3
         3      0      1      2      3], 'b0')
>>> arr[(arr > 2) & (arr.b == 'b0')] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1  42   5   6   7
>>> arr = ndtest((2, 4))
>>> sub = arr['b0']
>>> sub[sub > 2] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1  42   5   6   7

FWIW, the other work-around, which is taking a subset of the filter does not work either (and it is also broken, see issue #1085):

>>> arr = ndtest((2, 4))
>>> arr
>>> f = arr > 2
a\b     b0     b1     b2    b3
 a0  False  False  False  True
 a1   True   True   True  True
>>> f['b0,b2,b3']
a\b     b0     b2    b3
 a0  False  False  True
 a1   True   True  True
>>> arr['b0,b2,b3', f['b0,b2,b3']]
ValueError: key has several values for axis: b

For getting the values, we can successfully workaround it by doing it in two steps AND taking a subset of the filter:

>>> arr['b0,b2,b3'][f['b0,b2,b3']]
a_b  a0_b3  a1_b0  a1_b2  a1_b3
         3      4      6      7

But this does not work for setting values though (because we are setting values on a temporary copy):

>>> arr = ndtest((2, 4))
>>> arr['b0,b2,b3'][f['b0,b2,b3']] = 42
>>> arr
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5   6   7