Open jbrewster7 opened 12 months ago
I'm assuming that the error happens in eg[mask]
, not in fastjet.ClusterSequence
or exclusive_jets_constituent_index
, right?
Where does ak.num(eg, axis=1)
not equal ak.num(mask, axis=1)
?
Or if str(mask.type)
has two "vars" just like str(eg.type)
, where does ak.num(eg, axis=2)
not equal ak.num(mask, axis=2)
?
That would be the first step. The next step is figuring out why they're not equal, because computing eg[mask]
presupposes that they fit together.
I made the wrong assumption. You showed that the error definitely happens in exclusive_jets_constituents
, but it looks like an Awkward slicing error, somewhere in its Python implementation.
Hi @jpivarski, OK! So .... what is best for us to do here? Would you like us to open an issue in Awkward, or should we leave that to you? Since the working example we have relies on Fastjet, it's not entirely clear to us how to provide useful feedback to Awkward on this. Thoughts welcome!!
The crossed-out part was assuming that it's a bug in Awkward (in eg[mask]
). It might still be, but it's also possible that exclusive_jets_constituents
is using it wrong.
It's implemented here:
You've verified that exclusive_jets_constituent_index
returns a value, so it should be possible to walk through each one of those steps. Do the results of the intermediate steps look right?
Hi @jpivarski, the error is thrown on line 307
when slicing prepared
. The real issue seems to be happening before that in exclusive_jets_constituent_index
since it is returning indices out of range. I tried to look into this a bit but all I could tell was that line 208
already had these out of range indices returned from to_numpy_exclusive_njet_with_constituents
. I couldn't find a way to look any further into this.
So exclusive_jets_constituent_index
returned a value, but it was the wrong value because there are indexes out of bounds. (It also means that there is no pure Python work-around, since you can't switch to using exclusive_jets_constituent_index
instead of exclusive_jets_constituent
.)
I followed exclusive_jets_constituent_index
back to the C++ routine, which is to_numpy_exclusive_njet_with_constituents
. The error must be in here:
which is unpacking the NumPy array inputs, constructing FastJet objects, running FastJet algoriths, and then packing the results into NumPy array outputs. Since the issue is that an index is off, it's probably not the translation between NumPy arrays and FastJet objects, but in handling the FastJet objects. Maybe an off-by-one error somewhere?
Hi, after looking into this a bit more I've realized that this is a problem with more than just exclusive_jets_constituent_index
, though that is the function where the differences are apparent enough to cause noticeable problems. I believe the problem is most likely occurring in ClusterSequence
. If I take the same example as I originally used, with the array eg
, there are problems happening in the same place for the arrays returned from the functions exclusive_jets
and inclusive_jets
as well. This is noticeable when mask
is applied before clustering vs not applied until afterwards.
>>> fastjet.ClusterSequence(eg[mask], jetdef).exclusive_jets(n_jets=2)
[[{px: 0.377, py: 0.116, pz: -0.0749, E: 0.425}, {px: -0.181, ...}],
[{px: 0.54, py: -0.65, pz: -0.00527, E: 0.857}, {px: 0.253, ...}],
[{px: 4.65, py: -1.88, pz: -3.29, E: 6}, {px: 14.5, py: -1.73, ...}],
[{px: -3.55, py: -1.64, pz: -0.0941, E: 3.91}, {px: -1.33, py: ..., ...}]]
---------------------------------------------------------------------------
type: 4 * var * Momentum4D[
px: float64,
py: float64,
pz: float64,
E: float64
]
>>> fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)
[[{px: 0.377, py: 0.116, pz: -0.0749, E: 0.425}, {px: -0.181, ...}],
[{px: 0.54, py: -0.65, pz: -0.00527, E: 0.857}, {px: 0.253, ...}],
[{px: 0.294, py: 0.254, pz: -0.259, E: 0.467}, {px: 4.65, py: ..., ...}],
[{px: 1.45, py: -0.179, pz: -0.876, E: 1.71}, {px: 12.8, py: -1.8, ...}],
[{px: -3.55, py: -1.64, pz: -0.0941, E: 3.91}, {px: -1.33, py: ..., ...}]]
---------------------------------------------------------------------------
type: 5 * var * Momentum4D[
px: float64,
py: float64,
pz: float64,
E: float64
]
>>> mask
[True,
True,
False,
True,
True]
--------------
type: 5 * bool
Specifically I noticed that fastjet.ClusterSequence(eg[mask], jetdef).exclusive_jets(n_jets=2)[2]
should be equal to fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[mask][2]
(which is equivalent to fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[3]
). However, we have
>>> fastjet.ClusterSequence(eg[mask], jetdef).exclusive_jets(n_jets=2)[2]
[{px: 4.65, py: -1.88, pz: -3.29, E: 6},
{px: 14.5, py: -1.73, pz: -8.31, E: 17}]
-----------------------------------------
type: 2 * Momentum4D[
px: float64,
py: float64,
pz: float64,
E: float64
]
>>> fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[mask][2]
[{px: 1.45, py: -0.179, pz: -0.876, E: 1.71},
{px: 12.8, py: -1.8, pz: -7.18, E: 14.8}]
---------------------------------------------
type: 2 * Momentum4D[
px: float64,
py: float64,
pz: float64,
E: float64
]
Instead, fastjet.ClusterSequence(eg[mask], jetdef).exclusive_jets(n_jets=2)[2][0]
is equal to fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[2][1]
.
Also, fastjet.ClusterSequence(eg[mask], jetdef).exclusive_jets(n_jets=2)[2][1]
is equal (within a decimal place) to the sum of fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[2][0]
, fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[3][0]
, and fastjet.ClusterSequence(eg, jetdef).exclusive_jets(n_jets=2)[3][1]
.
Please let me know if any of this is unclear and thank you for all the help with this!
You are absolutely right. The way we handle masked arrays is wrong. Running
px, py, pz, E, offsets = self.extract_cons(self.data)
print(offsets)
gives
array([ 0, 2, 4, 8, 10])
The correct layout is
>>> eg[mask].layout
<ListArray len='4'>
<starts><Index dtype='int64' len='4'>
[0 2 6 8]
</Index></starts>
<stops><Index dtype='int64' len='4'>
[ 2 4 8 10]
</Index></stops>
...
This explains why you see all these extra elements between offsets 4 and 8. I think we should implement this function with start & stop offsets instead of using only the stop as the offset. @jpivarski any suggestions would be very welcome. Do you think my proposal makes any sense?
It sounds like a good idea. (I haven't looked deeply into the details.)
@chrispap95 / @jpivarski any movement on this? It's blocking progress for future collider analysis, it would nice to have it sorted out.
Hello, I am working with @kpachal, @mswiatlo, and @lgray on the development of Coffea and some analysis that involves the use of fastjet. We've been very happy with how smoothly everything using awkward arrays integrates with fastjet.
However, we have noticed a problem when running exclusive_jets_constituent_index() with a masked awkward array. If an awkward array has a boolean mask applied to it on axis 0 before clustering, exclusive_jets_constituent_index() returns indices that are out of range. An error is also thrown when trying to call exclusive_jets_constituents() since the out of range indices are being applied to the array.
Here is an example. Running this
returns
which contains out of range indices at index 2 on axis 0. If we then run
it throws the error
due to trying to use the out of range indices.
This is not a problem if we just run the unmasked array
which gives
Thank you for the help on this issue, it will be greatly appreciated!