Open matplo opened 5 years ago
@matplo @ezradlesser
Ideally we would implement generator-to-ROOT in the same TTree format as we use when analyzing the data. That would allow the fastsim to work seamlessly.
The desired format can be seen e.g. on hiccup at /rstorage/alice/data/LHC18b8/146/child_1/TrainOutput/282008/10/AnalysisResultsPtHard10.root
We currently use two trees in this file:
tree_Particle_gen
-- one entry per track for run_number
, ev_id
, ParticlePt
, ParticleEta
, ParticlePhi
tree_event_char
-- one entry per event for run_number
, ev_id
, z_vtx_reco
, is_ev_rej
Here, the expectation is that the combination of run_number
and ev_id
provides a unique event id. I agree with James, and it seems to me that it would make sense to integrate a conversion from HepMC to the existing TTree format as its own "process" class, and perhaps also using the existing code for submitting jobs via sbatch.
Perhaps additionally one could also integrate the generation of new events with PYTHIA and directly tie into the fastsim for creating TTrees in one step.
Yes, agreed -- although I would suggest we keep this in a separate directory than "process", for example we can call it "generation".
Then we would have three distinct steps:
A converter implemented at: https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/hepmc2antuple.py Use case: https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/test_convert_hepmc.sh
Note-1 the --as-data flag to the converter - saves the particles into tree_Particle instead of tree_Particle_gen expected for MC...
Note-2 I added PDG id branch to the particle tree.
This was tested for HEPMC2 generated with heppy setup (as per example) - it should also work for HEPMC3 files... however, we have seen in the past we had to use different hepmc readers... see hepmc_*_jetreco.py test analyses in https://github.com/matplo/heppy/tree/master/heppy/examples . However, we are closer to what we need...
Additinal note: on the same computer conversion is about 2x slower than pythia generation of the hepmc file; on the other hand, analysis of a root file is fast (in the test an analysis was constit. subtraction, jet finding & soft drop)...
I realized something trivial: we can use directly TNtuple from root rather than the tree writer (which is simply much more useful for some more complex structures than basic types). https://github.com/matplo/pyjetty/blob/master/pyjetty/sandbox/hepmc2antuple_tn.py Speed up is about 'only' 14% but the file size is 50% (x2 smaller) - somewhat surprising (compression? floats instead of doubles? other reasons within ROOT streamers / IO?).
Moved a copy of the converter to alice_analysis/generation: #11
And added generator-specific options: #13
To do:
Implemented a separate version of the converter for JEWEL in #14
https://github.com/matplo/pyjetty/pull/48 adds some formal mechanism for obtaining partons from the HepMC files -- though it is not yet implemented for most generators
Something to think about: simply go via the most recent HepMC3.1.x and ROOTIO then to PANDAS via uproot. Now, this makes lot of sense for generate-and-write-then-read scenario - not necessarily "on the fly" - having that perhaps there is a uncomplicated way to set this up "in memory" on event basis.