scikit-hep / vector

Vector classes and utilities
https://vector.readthedocs.io
BSD 3-Clause "New" or "Revised" License
77 stars 24 forks source link

Addition of photons in `coffea` fails because they don't have `mass` and `charge` fields. #498

Open ikrommyd opened 3 weeks ago

ikrommyd commented 3 weeks ago

Vector Version

1.4.1

Python Version

3.11

OS / Environment

macOS but doesn't matter

Describe the bug

Currently in coffea if you do

from coffea.nanoevents import NanoEventsFactory
events = NanoEventsFactory.from_root({"tests/samples/DYto2E.root": "Events"}).events()
events.Photon + events.Photon

you will get an error

ValueError: array does not have temporal coordinates (t/E/e/energy or tau/M/m/mass): seediEtaOriX, cutBased, cutBased_Fall17V2, electronVeto, isScEtaEB, isScEtaEE, mvaID_Fall17V2_WP80, mvaID_Fall17V2_WP90, mvaID_WP80, mvaID_WP90, pixelSeed, seedGain, electronIdx, jetIdx, seediPhiOriY, vidNestedWPBitmap, vidNestedWPBitmap_Fall17V2, energyErr, energyRaw, esEffSigmaRR, esEnergyOverRawE, eta, etaWidth, haloTaggerMVAVal, hoe, hoe_PUcorr, mvaID, mvaID_Fall17V2, pfChargedIsoPFPV, pfChargedIsoWorstVtx, pfPhoIso03, pfRelIso03_all_Fall17V2, pfRelIso03_all_quadratic, pfRelIso03_chg_Fall17V2, pfRelIso03_chg_quadratic, phi, phiWidth, pt, r9, s4, sieie, sieip, sipip, trkSumPtHollowConeDR03, x_calo, y_calo, z_calo, electronIdxG, jetIdxG

This error occurred while calling

    numpy.add.__call__(
        <PhotonArray-typetracer [...] type='## * var * Photon[seediEtaOriX:...'>
        <PhotonArray-typetracer [...] type='## * var * Photon[seediEtaOriX:...'>
    )

If my understanding is correct, it's looking for mass field rather than an accessor. The photons don't have mass and a charge field right now in nanoaod. They used to at some point and it was set to zero. That's why you will not get this failure if you try this with the nano_dy.root file from coffea because it's old and thats why this wasn't caught by tests I assume. Lindsey commented that this should be solved with behaviors that supply the necessary inputs and and not keep hard-requiring fields to be present.

cc @lgray

Any additional but relevant log output

No response

lgray commented 2 weeks ago

I don't think this is a bug, but rather a feature request.

@jpivarski @Saransh-cpp do you think it would be possible to specify behavior functions or properties as sources of four-momentum data in addition to only fields?

ikrommyd commented 2 weeks ago

Yeah, I honestly didn't think about the label. Sorry. Someone with permissions please change it.

jpivarski commented 2 weeks ago

So, for instance, in lines like

https://github.com/scikit-hep/vector/blob/13a63706a7b716de85714c142e92ceff5b9022ff/src/vector/backends/awkward.py#L129

use array.x, array.y instead of array["x"], array["y"] (consistently, everywhere).

The problem I see with that is that properties take precedence over fields, and therefore an implementation would infinite-loop between, say, the property x that is implemented in terms of the property phi that is implemented in terms of the property x. It might be possible to ensure that one class has properties x and y defined and another class has rho and phi defined, but we'd have to be very careful to be sure that no class has both.

(This sounds like a big-ish project to make sure that all the ducks are in a row. There are a lot of ducks.)

ikrommyd commented 2 weeks ago

At least a hotfix somewhere soon is necessary for the time being. Diphotons are broken.

Saransh-cpp commented 2 weeks ago

If I am not wrong, a hot fix can be overriding the method in coffea and passing it to the behavior dict (instead of copying the four-vector behavior).

ikrommyd commented 1 week ago

Just a gentle ping on this as it's really a problem.

lgray commented 1 week ago

@Saransh-cpp could you please make an example of the fix you have suggested?

Saransh-cpp commented 6 days ago

Hi, I thought that the issue was performing addition on the charge field, which is why I suggested overriding the add method in behavior dict, but I was wrong. I looked at the issue wrong, everything is in fact handled well by Candidate.add.

I tried swapping the main pain point ["..."] occurrence with .field and though it works for the example above, it leads to infinite recursion in several other cases (as pointed out by @jpivarski above) -

Quick fix diff: ```diff diff --git a/src/vector/backends/awkward.py b/src/vector/backends/awkward.py index 5ca6804..6a581bb 100644 --- a/src/vector/backends/awkward.py +++ b/src/vector/backends/awkward.py @@ -325,8 +325,8 @@ class TemporalAwkward(CoordinatesAwkward, Temporal): return TemporalAwkwardTau(array["M"]) elif "m" in fields: return TemporalAwkwardTau(array["m"]) - elif "mass" in fields: - return TemporalAwkwardTau(array["mass"]) + elif "mass" in fields or hasattr(array, "mass"): + return TemporalAwkwardTau(array.mass) else: raise ValueError( "array does not have temporal coordinates (t/E/e/energy or tau/M/m/mass): " ```
Working MWE: ```py In [1]: from coffea.nanoevents import NanoEventsFactory ...: events = NanoEventsFactory.from_root({"DYto2E.root": "Events"}).events() In [2]: events.Photon + events.Photon Out[2]: dask.awkward ```
Infinite recursion trace: ```py In [1]: import vector; import awkward as ak In [2]: vector.register_awkward() In [3]: ak.zip({"mass": [1], "px": [1], "py": [1], "pz": [1]}, with_name="Momentum4D") ** 2 ------------------------------------------------------------------------ RecursionError Traceback (most recent call last) Cell In[3], line 1 ----> 1 ak.zip( 2 {"mass": [1], "px": [1], "py": [1], "pz": [1]}, with_name="Momentum4D" 3 ) ** 2 File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_operators.py:53, in _binary_method..func(self, other) 51 if _disables_array_ufunc(other): 52 return NotImplemented ---> 53 return ufunc(self, other) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/highlevel.py:1511, in Array.__array_ufunc__(self, ufunc, method, *inputs, **kwargs) 1509 name = f"{type(ufunc).__module__}.{ufunc.__name__}.{method!s}" 1510 with ak._errors.OperationErrorContext(name, inputs, kwargs): -> 1511 return ak._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_connect/numpy.py:466, in array_ufunc(ufunc, method, inputs, kwargs) 458 raise TypeError( 459 "no {}.{} overloads for custom types: {}".format( 460 type(ufunc).__module__, ufunc.__name__, ", ".join(error_message) 461 ) 462 ) 464 return None --> 466 out = ak._broadcasting.broadcast_and_apply( 467 inputs, action, allow_records=False, function_name=ufunc.__name__ 468 ) 470 if len(out) == 1: 471 return wrap_layout(out[0], behavior=behavior, attrs=attrs) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_broadcasting.py:1108, in broadcast_and_apply(inputs, action, depth_context, lateral_context, allow_records, left_broadcast, right_broadcast, numpy_to_regular, regular_to_jagged, function_name, broadcast_parameters_rule) 1106 backend = backend_of(*inputs, coerce_to_common=False) 1107 isscalar = [] -> 1108 out = apply_step( 1109 backend, 1110 broadcast_pack(inputs, isscalar), 1111 action, 1112 0, 1113 depth_context, 1114 lateral_context, 1115 { 1116 "allow_records": allow_records, 1117 "left_broadcast": left_broadcast, 1118 "right_broadcast": right_broadcast, 1119 "numpy_to_regular": numpy_to_regular, 1120 "regular_to_jagged": regular_to_jagged, 1121 "function_name": function_name, 1122 "broadcast_parameters_rule": broadcast_parameters_rule, 1123 }, 1124 ) 1125 assert isinstance(out, tuple) 1126 return tuple(broadcast_unpack(x, isscalar) for x in out) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_broadcasting.py:1086, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options) 1084 return result 1085 elif result is None: -> 1086 return continuation() 1087 else: 1088 raise AssertionError(result) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_broadcasting.py:1055, in apply_step..continuation() 1053 # Any non-string list-types? 1054 elif any(x.is_list and not is_string_like(x) for x in contents): -> 1055 return broadcast_any_list() 1057 # Any RecordArrays? 1058 elif any(x.is_record for x in contents): File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_broadcasting.py:623, in apply_step..broadcast_any_list() 620 nextinputs.append(x) 621 nextparameters.append(NO_PARAMETERS) --> 623 outcontent = apply_step( 624 backend, 625 nextinputs, 626 action, 627 depth + 1, 628 copy.copy(depth_context), 629 lateral_context, 630 options, 631 ) 632 assert isinstance(outcontent, tuple) 633 parameters = parameters_factory(nextparameters, len(outcontent)) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_broadcasting.py:1068, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options) 1061 else: 1062 raise ValueError( 1063 "cannot broadcast: {}{}".format( 1064 ", ".join(repr(type(x)) for x in inputs), in_function(options) 1065 ) 1066 ) -> 1068 result = action( 1069 inputs, 1070 depth=depth, 1071 depth_context=depth_context, 1072 lateral_context=lateral_context, 1073 continuation=continuation, 1074 backend=backend, 1075 options=options, 1076 ) 1078 if isinstance(result, tuple) and all(isinstance(x, Content) for x in result): 1079 if any(content.backend is not backend for content in result): File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_connect/numpy.py:378, in array_ufunc..action(inputs, **ignore) 376 custom = find_ufunc(behavior, signature) 377 if custom is not None: --> 378 return _array_ufunc_adjust(custom, inputs, kwargs, behavior) 380 # Do we have any categoricals? 381 if any( 382 x.is_indexed and x.parameter("__array__") == "categorical" for x in contents 383 ): File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_connect/numpy.py:183, in _array_ufunc_adjust(custom, inputs, kwargs, behavior) 174 def _array_ufunc_adjust( 175 custom, inputs, kwargs: dict[str, Any], behavior: Mapping | None 176 ): 177 args = [ 178 wrap_layout(x, behavior) 179 if isinstance(x, (ak.contents.Content, ak.record.Record)) 180 else x 181 for x in inputs 182 ] --> 183 out = custom(*args, **kwargs) 184 if not isinstance(out, tuple): 185 out = (out,) File ~/Code/HEP/vector/src/vector/backends/awkward.py:1512, in (v, expo) 1505 behavior[numpy.power, "Momentum2D", numbers.Real] = ( 1506 lambda v, expo: v.rho2 if expo == 2 else v.rho**expo 1507 ) 1508 behavior[numpy.power, "Momentum3D", numbers.Real] = ( 1509 lambda v, expo: v.mag2 if expo == 2 else v.mag**expo 1510 ) 1511 behavior[numpy.power, "Momentum4D", numbers.Real] = ( -> 1512 lambda v, expo: v.tau2 if expo == 2 else v.tau**expo 1513 ) 1515 behavior["__cast__", VectorNumpy2D] = lambda v: vector.Array(v) 1516 behavior["__cast__", VectorNumpy3D] = lambda v: vector.Array(v) File ~/Code/HEP/vector/src/vector/_methods.py:3814, in Lorentz.tau2(self) 3810 @property 3811 def tau2(self) -> ScalarCollection: 3812 from vector._compute.lorentz import tau2 -> 3814 return tau2.dispatch(self) File ~/Code/HEP/vector/src/vector/_compute/lorentz/tau2.py:107, in dispatch(v) 100 def dispatch(v: typing.Any) -> typing.Any: 101 function, *returns = _from_signature( 102 __name__, 103 dispatch_map, 104 ( 105 _aztype(v), 106 _ltype(v), --> 107 _ttype(v), 108 ), 109 ) 110 with numpy.errstate(all="ignore"): 111 return v._wrap_result( 112 _flavor_of(v), 113 function( (...) 120 1, 121 ) File ~/Code/HEP/vector/src/vector/_methods.py:4381, in _ttype(obj) 4376 def _ttype(obj: VectorProtocolLorentz) -> type[Coordinates]: 4377 """ 4378 Determines the Temporal type of a vector for use in looking up a 4379 dispatched function. 4380 """ -> 4381 if hasattr(obj, "temporal"): 4382 for t in type(obj.temporal).__mro__: 4383 if t in (TemporalT, TemporalTau): File ~/Code/HEP/vector/src/vector/backends/awkward.py:1259, in MomentumAwkward4D.temporal(self) 1242 @property 1243 def temporal(self) -> TemporalAwkward: 1244 """ 1245 Returns a Temporal type object containing the temporal coordinates. 1246 (...) 1257 (,) 1258 """ -> 1259 return TemporalAwkward.from_momentum_fields(self) File ~/Code/HEP/vector/src/vector/backends/awkward.py:329, in TemporalAwkward.from_momentum_fields(cls, array) 327 return TemporalAwkwardTau(array["m"]) 328 elif "mass" in fields or hasattr(array, "mass"): --> 329 return TemporalAwkwardTau(array.mass) 330 else: 331 raise ValueError( 332 "array does not have temporal coordinates (t/E/e/energy or tau/M/m/mass): " 333 f"{', '.join(fields)}" 334 ) File ~/Code/HEP/vector/src/vector/_methods.py:4137, in LorentzMomentum.mass(self) 4135 @property 4136 def mass(self) -> ScalarCollection: -> 4137 return self.tau File ~/Code/HEP/vector/src/vector/_methods.py:3808, in Lorentz.tau(self) 3804 @property 3805 def tau(self) -> ScalarCollection: 3806 from vector._compute.lorentz import tau -> 3808 return tau.dispatch(self) File ~/Code/HEP/vector/src/vector/_compute/lorentz/tau.py:113, in dispatch(v) 106 def dispatch(v: typing.Any) -> typing.Any: 107 function, *returns = _from_signature( 108 __name__, 109 dispatch_map, 110 ( 111 _aztype(v), 112 _ltype(v), --> 113 _ttype(v), 114 ), 115 ) 116 with numpy.errstate(all="ignore"): 117 return v._wrap_result( 118 _flavor_of(v), 119 function( (...) 126 1, 127 ) File ~/Code/HEP/vector/src/vector/_methods.py:4381, in _ttype(obj) 4376 def _ttype(obj: VectorProtocolLorentz) -> type[Coordinates]: 4377 """ 4378 Determines the Temporal type of a vector for use in looking up a 4379 dispatched function. 4380 """ -> 4381 if hasattr(obj, "temporal"): 4382 for t in type(obj.temporal).__mro__: 4383 if t in (TemporalT, TemporalTau): File ~/Code/HEP/vector/src/vector/backends/awkward.py:1259, in MomentumAwkward4D.temporal(self) 1242 @property 1243 def temporal(self) -> TemporalAwkward: 1244 """ 1245 Returns a Temporal type object containing the temporal coordinates. 1246 (...) 1257 (,) 1258 """ -> 1259 return TemporalAwkward.from_momentum_fields(self) File ~/Code/HEP/vector/src/vector/backends/awkward.py:329, in TemporalAwkward.from_momentum_fields(cls, array) 327 return TemporalAwkwardTau(array["m"]) 328 elif "mass" in fields or hasattr(array, "mass"): --> 329 return TemporalAwkwardTau(array.mass) 330 else: 331 raise ValueError( 332 "array does not have temporal coordinates (t/E/e/energy or tau/M/m/mass): " 333 f"{', '.join(fields)}" 334 ) File ~/Code/HEP/vector/src/vector/_methods.py:4137, in LorentzMomentum.mass(self) 4135 @property 4136 def mass(self) -> ScalarCollection: -> 4137 return self.tau File ~/Code/HEP/vector/src/vector/_methods.py:3808, in Lorentz.tau(self) 3804 @property 3805 def tau(self) -> ScalarCollection: 3806 from vector._compute.lorentz import tau -> 3808 return tau.dispatch(self) File ~/Code/HEP/vector/src/vector/_compute/lorentz/tau.py:113, in dispatch(v) 106 def dispatch(v: typing.Any) -> typing.Any: 107 function, *returns = _from_signature( 108 __name__, 109 dispatch_map, 110 ( 111 _aztype(v), 112 _ltype(v), --> 113 _ttype(v), 114 ), 115 ) 116 with numpy.errstate(all="ignore"): 117 return v._wrap_result( 118 _flavor_of(v), 119 function( (...) 126 1, 127 ) [... skipping similar frames: _ttype at line 4381 (489 times), TemporalAwkward.from_momentum_fields at line 329 (489 times), LorentzMomentum.mass at line 4137 (489 times), Lorentz.tau at line 3808 (489 times), MomentumAwkward4D.temporal at line 1259 (489 times), dispatch at line 113 (488 times)] File ~/Code/HEP/vector/src/vector/_compute/lorentz/tau.py:113, in dispatch(v) 106 def dispatch(v: typing.Any) -> typing.Any: 107 function, *returns = _from_signature( 108 __name__, 109 dispatch_map, 110 ( 111 _aztype(v), 112 _ltype(v), --> 113 _ttype(v), 114 ), 115 ) 116 with numpy.errstate(all="ignore"): 117 return v._wrap_result( 118 _flavor_of(v), 119 function( (...) 126 1, 127 ) File ~/Code/HEP/vector/src/vector/_methods.py:4381, in _ttype(obj) 4376 def _ttype(obj: VectorProtocolLorentz) -> type[Coordinates]: 4377 """ 4378 Determines the Temporal type of a vector for use in looking up a 4379 dispatched function. 4380 """ -> 4381 if hasattr(obj, "temporal"): 4382 for t in type(obj.temporal).__mro__: 4383 if t in (TemporalT, TemporalTau): File ~/Code/HEP/vector/src/vector/backends/awkward.py:1259, in MomentumAwkward4D.temporal(self) 1242 @property 1243 def temporal(self) -> TemporalAwkward: 1244 """ 1245 Returns a Temporal type object containing the temporal coordinates. 1246 (...) 1257 (,) 1258 """ -> 1259 return TemporalAwkward.from_momentum_fields(self) File ~/Code/HEP/vector/src/vector/backends/awkward.py:329, in TemporalAwkward.from_momentum_fields(cls, array) 327 return TemporalAwkwardTau(array["m"]) 328 elif "mass" in fields or hasattr(array, "mass"): --> 329 return TemporalAwkwardTau(array.mass) 330 else: 331 raise ValueError( 332 "array does not have temporal coordinates (t/E/e/energy or tau/M/m/mass): " 333 f"{', '.join(fields)}" 334 ) File ~/Code/HEP/vector/src/vector/_methods.py:4137, in LorentzMomentum.mass(self) 4135 @property 4136 def mass(self) -> ScalarCollection: -> 4137 return self.tau File ~/Code/HEP/vector/src/vector/_methods.py:3808, in Lorentz.tau(self) 3804 @property 3805 def tau(self) -> ScalarCollection: 3806 from vector._compute.lorentz import tau -> 3808 return tau.dispatch(self) File ~/Code/HEP/vector/src/vector/_compute/lorentz/tau.py:111, in dispatch(v) 106 def dispatch(v: typing.Any) -> typing.Any: 107 function, *returns = _from_signature( 108 __name__, 109 dispatch_map, 110 ( --> 111 _aztype(v), 112 _ltype(v), 113 _ttype(v), 114 ), 115 ) 116 with numpy.errstate(all="ignore"): 117 return v._wrap_result( 118 _flavor_of(v), 119 function( (...) 126 1, 127 ) File ~/Code/HEP/vector/src/vector/_methods.py:4357, in _aztype(obj) 4352 def _aztype(obj: VectorProtocolPlanar) -> type[Coordinates]: 4353 """ 4354 Determines the Azimuthal type of a vector for use in looking up a 4355 dispatched function. 4356 """ -> 4357 if hasattr(obj, "azimuthal"): 4358 for t in type(obj.azimuthal).__mro__: 4359 if t in (AzimuthalXY, AzimuthalRhoPhi): File ~/Code/HEP/vector/src/vector/backends/awkward.py:1220, in MomentumAwkward4D.azimuthal(self) 1202 @property 1203 def azimuthal(self) -> AzimuthalAwkward: 1204 """ 1205 Returns an Azimuthal type object containing the azimuthal coordinates. 1206 (...) 1218 (, ) 1219 """ -> 1220 return AzimuthalAwkward.from_momentum_fields(self) File ~/Code/HEP/vector/src/vector/backends/awkward.py:163, in AzimuthalAwkward.from_momentum_fields(cls, array) 161 return AzimuthalAwkwardXY(array["px"], array["y"]) 162 elif "px" in fields and "py" in fields: --> 163 return AzimuthalAwkwardXY(array["px"], array["py"]) 164 elif "rho" in fields and "phi" in fields: 165 return AzimuthalAwkwardRhoPhi(array["rho"], array["phi"]) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/highlevel.py:1066, in Array.__getitem__(self, where) 636 """ 637 Args: 638 where (many types supported; see below): Index of positions to (...) 1062 have the same dimension as the array being indexed. 1063 """ 1064 with ak._errors.SlicingErrorContext(self, where): 1065 return wrap_layout( -> 1066 prepare_layout(self._layout[where]), 1067 self._behavior, 1068 allow_other=True, 1069 attrs=self._attrs, 1070 ) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/contents/content.py:512, in Content.__getitem__(self, where) 511 def __getitem__(self, where): --> 512 return self._getitem(where) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/contents/content.py:529, in Content._getitem(self, where) 526 return self._getitem((where,)) 528 elif isinstance(where, str): --> 529 return self._getitem_field(where) 531 elif where is np.newaxis: 532 return self._getitem((where,)) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/contents/recordarray.py:458, in RecordArray._getitem_field(self, where, only_fields) 454 def _getitem_field( 455 self, where: str | SupportsIndex, only_fields: tuple[str, ...] = () 456 ) -> Content: 457 if len(only_fields) == 0: --> 458 return self.content(where) 460 else: 461 nexthead, nexttail = ak._slicing.head_tail(only_fields) File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/contents/recordarray.py:394, in RecordArray.content(self, index_or_field) 393 def content(self, index_or_field: str | SupportsIndex) -> Content: --> 394 out = super().content(index_or_field) 395 if ( 396 self._length is unknown_length 397 or out.length is unknown_length 398 or out.length == self._length 399 ): 400 return out File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_meta/recordmeta.py:133, in RecordMeta.content(self, index_or_field) 132 def content(self, index_or_field: int | str) -> T: --> 133 if is_integer(index_or_field): 134 index = index_or_field 135 elif isinstance(index_or_field, str): File ~/Code/HEP/vector/.env/lib/python3.11/site-packages/awkward/_regularize.py:27, in is_integer(x) 26 def is_integer(x) -> bool: ---> 27 return isinstance(x, numbers.Integral) and not isinstance(x, bool) File :119, in __instancecheck__(cls, instance) RecursionError: maximum recursion depth exceeded in comparison This error occurred while calling numpy.power.__call__( 2 ) ```

I am not sure if there is a hotfix for this. (Given that this is not a small fix, I will not be able to work on it in the next couple of months. I'll definitely pick it up if it stays unresolved till then (or if I get spare time in between) ☹️)

ikrommyd commented 6 days ago

So from the physicist's perspective, one fix I can do is to replace the photon collection. I'm not sure if there are any caveats here I'm not seeing

In [31]: from coffea.nanoevents import NanoEventsFactory

In [32]: events = NanoEventsFactory.from_root({"tests/samples/DYto2E.root": "Events"}).events()

In [33]: events["Photon"] = ak.zip({"pt": events.Photon.pt, "eta": events.Photon.eta, "phi": events.Photon.phi, "mass": ak.zeros_like(events.Photon.pt), "charge": ak.zeros_like(events.Photon.pt)}, with_name="PtEtaPhiMCandidate")

In [34]: events.Photon + events.Photon
Out[34]: dask.awkward<add, npartitions=1>

In [35]: (events.Photon + events.Photon).mass.compute()
Out[35]: <Array [[0.0442, 0], ..., [-0.0442, -0.0625]] type='1000 * var * float32'>

And of course, I should add all the other photon fields in the ak.zip dictionary.

ikrommyd commented 6 days ago

@Saransh-cpp Should I also do from coffea.nanoevents.methods import candidate and behavior=candidate.behavior within the ak.zip or it doesn't do anything extra?

One more thing I'd like to comment is that if there isn't gonna be a fix quickly, I don't know if we can expect people to know this and replace the photon collection. This issue breaks basically every analysis that wants to calculate an invariant mass between photons and some other object (or themselves). If it is to stay like this for some time I'm wondering if it makes sense to have this photon replacement happen during nanoevents building in coffea as a temporary fix. This would be adding extra nodes to the events graph though from the beginning but the necessary columns optimization should take care of that during optimization. It should at least be announced though so that people know how to bypass this.

ikrommyd commented 6 days ago

So from the physicist's perspective, one fix I can do is to replace the photon collection. I'm not sure if there are any caveats here I'm not seeing

In [31]: from coffea.nanoevents import NanoEventsFactory

In [32]: events = NanoEventsFactory.from_root({"tests/samples/DYto2E.root": "Events"}).events()

In [33]: events["Photon"] = ak.zip({"pt": events.Photon.pt, "eta": events.Photon.eta, "phi": events.Photon.phi, "mass": ak.zeros_like(events.Photon.pt), "charge": ak.zeros_like(events.Photon.pt)}, with_name="PtEtaPhiMCandidate")

In [34]: events.Photon + events.Photon
Out[34]: dask.awkward<add, npartitions=1>

In [35]: (events.Photon + events.Photon).mass.compute()
Out[35]: <Array [[0.0442, 0], ..., [-0.0442, -0.0625]] type='1000 * var * float32'>

And of course, I should add all the other photon fields in the ak.zip dictionary.

Update 1: there is an issue with that fix. Photons are losing attributes like

FAILED tests/test_tag_and_probe_nanoaod.py::test_tag_and_probe_photons_trigger[True] - AttributeError: no field named 'matched_electron'
FAILED tests/test_tag_and_probe_nanoaod.py::test_tag_and_probe_photons_id[True] - AttributeError: no field named 'matched_electron'

Update 2: Oh I can do with_name="Photon" looks like.

Saransh-cpp commented 4 days ago

Copying comment from the PR linked above -


Hi! It should be behavior=nanoaod.behavior. candidate.behavior is included within nanoaod.behavior. coffea does not update the global awkward behavior (which is a good thing), which is why you need that extra behavior= arg.

Alternatively, to skip the behavior keyword, you could do something like -

>>> ak.behavior.update(nanoaod.behavior)
>>> a = ak.zip({"x": [1], "y": [2], "z": [3], "m": [4]}, with_name="Photon")
>>> a
<PhotonArray [{x: 1, y: 2, z: 3, m: 4}] type='1 * Photon[x: int64, y: int64...'>

Makes me wonder if coffea should have something like vector.register_awkward to make it easy for the users.

lgray commented 4 days ago

No, it was a conscious choice to make the behaviors contained within coffea objects, it's less confusing when you start using distributed environments to make sure everything is synced.

As for a fix that doesn't break everything else, we can hack NanoEvents to just make all-zero fields for specified columns without adding extra nodes. We already do a variant of this for trigger names where we produce chunks of Nones when the trigger table changes in a single dataset. Just have to do the same thing with zeros for Photon or other objects.

Can add in a _default_columns private variable or something like that. If the column doesn't exist on disk, generate it on the fly with a given value.