Open ikrommyd opened 3 months ago
I don't think this is a bug, but rather a feature request.
@jpivarski @Saransh-cpp do you think it would be possible to specify behavior functions or properties as sources of four-momentum data in addition to only fields?
Yeah, I honestly didn't think about the label. Sorry. Someone with permissions please change it.
So, for instance, in lines like
use array.x, array.y
instead of array["x"], array["y"]
(consistently, everywhere).
The problem I see with that is that properties take precedence over fields, and therefore an implementation would infinite-loop between, say, the property x
that is implemented in terms of the property phi
that is implemented in terms of the property x
. It might be possible to ensure that one class has properties x
and y
defined and another class has rho
and phi
defined, but we'd have to be very careful to be sure that no class has both.
(This sounds like a big-ish project to make sure that all the ducks are in a row. There are a lot of ducks.)
At least a hotfix somewhere soon is necessary for the time being. Diphotons are broken.
If I am not wrong, a hot fix can be overriding the method in coffea and passing it to the behavior dict (instead of copying the four-vector behavior).
Just a gentle ping on this as it's really a problem.
@Saransh-cpp could you please make an example of the fix you have suggested?
Hi, I thought that the issue was performing addition on the charge field, which is why I suggested overriding the add
method in behavior
dict, but I was wrong. I looked at the issue wrong, everything is in fact handled well by Candidate.add
.
I tried swapping the main pain point ["..."]
occurrence with .field
and though it works for the example above, it leads to infinite recursion in several other cases (as pointed out by @jpivarski above) -
I am not sure if there is a hotfix for this. (Given that this is not a small fix, I will not be able to work on it in the next couple of months. I'll definitely pick it up if it stays unresolved till then (or if I get spare time in between) ☹️)
So from the physicist's perspective, one fix I can do is to replace the photon collection. I'm not sure if there are any caveats here I'm not seeing
In [31]: from coffea.nanoevents import NanoEventsFactory
In [32]: events = NanoEventsFactory.from_root({"tests/samples/DYto2E.root": "Events"}).events()
In [33]: events["Photon"] = ak.zip({"pt": events.Photon.pt, "eta": events.Photon.eta, "phi": events.Photon.phi, "mass": ak.zeros_like(events.Photon.pt), "charge": ak.zeros_like(events.Photon.pt)}, with_name="PtEtaPhiMCandidate")
In [34]: events.Photon + events.Photon
Out[34]: dask.awkward<add, npartitions=1>
In [35]: (events.Photon + events.Photon).mass.compute()
Out[35]: <Array [[0.0442, 0], ..., [-0.0442, -0.0625]] type='1000 * var * float32'>
And of course, I should add all the other photon fields in the ak.zip
dictionary.
@Saransh-cpp Should I also do from coffea.nanoevents.methods import candidate
and behavior=candidate.behavior
within the ak.zip
or it doesn't do anything extra?
One more thing I'd like to comment is that if there isn't gonna be a fix quickly, I don't know if we can expect people to know this and replace the photon collection. This issue breaks basically every analysis that wants to calculate an invariant mass between photons and some other object (or themselves). If it is to stay like this for some time I'm wondering if it makes sense to have this photon replacement happen during nanoevents building in coffea as a temporary fix. This would be adding extra nodes to the events
graph though from the beginning but the necessary columns optimization should take care of that during optimization. It should at least be announced though so that people know how to bypass this.
So from the physicist's perspective, one fix I can do is to replace the photon collection. I'm not sure if there are any caveats here I'm not seeing
In [31]: from coffea.nanoevents import NanoEventsFactory In [32]: events = NanoEventsFactory.from_root({"tests/samples/DYto2E.root": "Events"}).events() In [33]: events["Photon"] = ak.zip({"pt": events.Photon.pt, "eta": events.Photon.eta, "phi": events.Photon.phi, "mass": ak.zeros_like(events.Photon.pt), "charge": ak.zeros_like(events.Photon.pt)}, with_name="PtEtaPhiMCandidate") In [34]: events.Photon + events.Photon Out[34]: dask.awkward<add, npartitions=1> In [35]: (events.Photon + events.Photon).mass.compute() Out[35]: <Array [[0.0442, 0], ..., [-0.0442, -0.0625]] type='1000 * var * float32'>
And of course, I should add all the other photon fields in the
ak.zip
dictionary.
Update 1: there is an issue with that fix. Photons are losing attributes like
FAILED tests/test_tag_and_probe_nanoaod.py::test_tag_and_probe_photons_trigger[True] - AttributeError: no field named 'matched_electron'
FAILED tests/test_tag_and_probe_nanoaod.py::test_tag_and_probe_photons_id[True] - AttributeError: no field named 'matched_electron'
Update 2: Oh I can do with_name="Photon"
looks like.
Copying comment from the PR linked above -
Hi! It should be behavior=nanoaod.behavior
. candidate.behavior
is included within nanoaod.behavior
. coffea does not update the global awkward behavior
(which is a good thing), which is why you need that extra behavior=
arg.
Alternatively, to skip the behavior
keyword, you could do something like -
>>> ak.behavior.update(nanoaod.behavior)
>>> a = ak.zip({"x": [1], "y": [2], "z": [3], "m": [4]}, with_name="Photon")
>>> a
<PhotonArray [{x: 1, y: 2, z: 3, m: 4}] type='1 * Photon[x: int64, y: int64...'>
Makes me wonder if coffea should have something like vector.register_awkward
to make it easy for the users.
No, it was a conscious choice to make the behaviors contained within coffea objects, it's less confusing when you start using distributed environments to make sure everything is synced.
As for a fix that doesn't break everything else, we can hack NanoEvents to just make all-zero fields for specified columns without adding extra nodes. We already do a variant of this for trigger names where we produce chunks of None
s when the trigger table changes in a single dataset. Just have to do the same thing with zeros for Photon or other objects.
Can add in a _default_columns
private variable or something like that. If the column doesn't exist on disk, generate it on the fly with a given value.
Vector Version
1.4.1
Python Version
3.11
OS / Environment
macOS but doesn't matter
Describe the bug
Currently in coffea if you do
you will get an error
If my understanding is correct, it's looking for mass field rather than an accessor. The photons don't have
mass
and acharge
field right now in nanoaod. They used to at some point and it was set to zero. That's why you will not get this failure if you try this with thenano_dy.root
file from coffea because it's old and thats why this wasn't caught by tests I assume. Lindsey commented that this should be solved with behaviors that supply the necessary inputs and and not keep hard-requiring fields to be present.cc @lgray
Any additional but relevant log output
No response