generator-to-pandas via HepMC-ROOTio

matplo commented 5 years ago

Something to think about: simply go via the most recent HepMC3.1.x and ROOTIO then to PANDAS via uproot. Now, this makes lot of sense for generate-and-write-then-read scenario - not necessarily "on the fly" - having that perhaps there is a uncomplicated way to set this up "in memory" on event basis.

jdmulligan commented 4 years ago

@matplo @ezradlesser

Ideally we would implement generator-to-ROOT in the same TTree format as we use when analyzing the data. That would allow the fastsim to work seamlessly.

The desired format can be seen e.g. on hiccup at /rstorage/alice/data/LHC18b8/146/child_1/TrainOutput/282008/10/AnalysisResultsPtHard10.root

We currently use two trees in this file:

tree_Particle_gen -- one entry per track for run_number, ev_id, ParticlePt, ParticleEta, ParticlePhi
tree_event_char -- one entry per event for run_number, ev_id, z_vtx_reco, is_ev_rej Here, the expectation is that the combination of run_number and ev_id provides a unique event id.

ezradlesser commented 4 years ago

I agree with James, and it seems to me that it would make sense to integrate a conversion from HepMC to the existing TTree format as its own "process" class, and perhaps also using the existing code for submitting jobs via sbatch.

Perhaps additionally one could also integrate the generation of new events with PYTHIA and directly tie into the fastsim for creating TTrees in one step.

jdmulligan commented 4 years ago

Yes, agreed -- although I would suggest we keep this in a separate directory than "process", for example we can call it "generation".

Then we would have three distinct steps:

generation: produce ROOT trees
process: ROOT trees --> histograms
analysis: histograms --> results

matplo commented 4 years ago

A converter implemented at: https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/hepmc2antuple.py Use case: https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/test_convert_hepmc.sh

Note-1 the --as-data flag to the converter - saves the particles into tree_Particle instead of tree_Particle_gen expected for MC...

Note-2 I added PDG id branch to the particle tree.

This was tested for HEPMC2 generated with heppy setup (as per example) - it should also work for HEPMC3 files... however, we have seen in the past we had to use different hepmc readers... see hepmc_*_jetreco.py test analyses in https://github.com/matplo/heppy/tree/master/heppy/examples . However, we are closer to what we need...

Additinal note: on the same computer conversion is about 2x slower than pythia generation of the hepmc file; on the other hand, analysis of a root file is fast (in the test an analysis was constit. subtraction, jet finding & soft drop)...

matplo commented 4 years ago

I realized something trivial: we can use directly TNtuple from root rather than the tree writer (which is simply much more useful for some more complex structures than basic types). https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/hepmc2antuple_tn.py Speed up is about 'only' 14% but the file size is 50% (x2 smaller) - somewhat surprising (compression? floats instead of doubles? other reasons within ROOT streamers / IO?).

jdmulligan commented 4 years ago

Moved a copy of the converter to alice_analysis/generation: #11

And added generator-specific options: #13

To do:

The HepMC parser only seems to work for pythia/herwig/jetscape, but not for jewel/martini/hybrid (at least the hepmc files I have).
Treating recoil needs some thought -- the acceptance functions for each generator are subject to change for jewel/martini/hybrid -- but should be trustable for pythia/herwig/jetscape.

jdmulligan commented 4 years ago

Implemented a separate version of the converter for JEWEL in #14

ezradlesser commented 3 years ago

https://github.com/matplo/pyjetty/pull/48 adds some formal mechanism for obtaining partons from the HepMC files -- though it is not yet implemented for most generators

matplo / pyjetty

generator-to-pandas via HepMC-ROOTio #1