iris-hep / analysis-grand-challenge

Repository dedicated to AGC preparations & execution
https://agc.readthedocs.io
MIT License
24 stars 39 forks source link

fix: avoid column overtouching in ML input feature calculation #204

Closed alexander-held closed 1 year ago

alexander-held commented 1 year ago

The previous calculation of ML input features accidentally caused the full lepton information to get materialized. This results in a significant I/O increase and slowdown. This update avoids the overtouching by performing the relevant piece of the calculation with a minimal re-built set of four-vectors with all other information stripped out.

This is expected to have significant performance impact for I/O bound setups.

see https://github.com/CoffeaTeam/coffea/issues/892 for details