larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Group aggregate has wrong (and different than getitem!) behavior for groups from other axes #1117

Open gdementen opened 1 month ago

gdementen commented 1 month ago
>>> arr = ndtest("a=a0,a2,a1")
>>> arr
a  a0  a2  a1
    0   1   2
>>> g = arr.a.i[1]
>>> g
a.i[1]
>>> arr[g]
1
>>> arr.sum(g)
1

Now let us define another array with the same axis name, and reuse the g group:

>>> arr2 = ndtest("a=a0..a2")
>>> arr2
a  a0  a1  a2
    0   1   2

For getitem, the group is retargeted to the "local array" axis (i.e. arr2.a) by using the labels corresponding to the group on the original axis:

>>> arr2[g]
2

But for group aggregates, this is the position, irrespective of the label:

>>> arr2.sum(g)
1

This is because of the:

axis_key = axis_key.with_axis(real_axis)

line in AxisCollection._guess_axis which should probably use retarget_to instead.

gdementen commented 1 month ago

In fact, the problem caused by that line is present for LGroup too (but there only for slice groups):

>>> arr = ndtest(3)
>>> arr
a  a0  a1  a2
    0   1   2
>>> a2 = Axis('a=a0,a2,a1')
>>> arr.sum(arr.a['a0:a1']) == 1
>>> arr.sum(a2['a0:a1']) # should be 3
1

And in fact, I just noticed I had already found and fixed this bug in one of my local branches (slice_groups_incompatible_axis) almost two years ago. Pffff... 😭

I am not 100% sure we should have this retarget behavior in the first place. Both can be confusing. But I am 100% sure getitem and group aggregate should behave the same.