scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

JaggedArray.argmax() returns multiple values for some events #176

Closed benkrikler closed 5 years ago

benkrikler commented 5 years ago

I'm having an issue with argmax for version 0.12.5 (current version on pypi).

The following snippet demonstrates the problem:

masses = [[983, 715, 796, 213, 702, 789, 404, 319, 290, 346, 693, 771, 308, 337,
            236, 274, 180, 322, 361, 157, 687, 769, 317, 321, 231, 282, 148, 319,
            367, 144, 189, 275, 303, 137, 62, 928, 983, 171, 726, 177, 206, 715,
            380, 403, 312, 702, 295, 297, 254, 338, 698, 299, 312, 255, 339, 290],
           [705, 884, 645, 377, 783, 476, 321, 738, 126, 316]]
masses = JaggedArray.fromiter(masses)
print("argmax", masses.argmax())
print("max", masses.max())
print("val of argmax", masses[selected])

which will print out the following and then die on the last line:

argmax [[0] [36 1]]
max [983 884]

While the max method works as expected, argmax seems to be confused, seemingly using the wrong starts and stops to define each event.

This might be related to issue #158 but looks distinct in that the result is greater than 1 entry in the resulting argmax array.

jpivarski commented 5 years ago

It's different from #158, and I just put in a correction in PR #177.

benkrikler commented 5 years ago

Speedy as ever :) Thank you very much!

lukasheinrich commented 5 years ago

Hi, I ran into the same issue. At first I thought this might help

idx = vector_sum.pt.argmax().pad(1,clip=1) #pick up the first of possible multiple values
vector_sum.pt[idx]

but it seems the indices are shifted so that an the superfluous indices move over into the next events. Not sure if this is covered by the fix

the fix in master works, but I wonder if that reveals some other inconsistency (would you expect this not to work, even thought I explicitly clip to 1)?

jpivarski commented 5 years ago

It probably is covered by the fix: the error was one of shifting superfluous events, rather than hiding them as unreachable data. Your example uses an argmax, so unless you're using the latest version, it's suspect.