Andy Buckley's comments on EDM4hep MC structures

gganis commented 1 year ago

Received these comments form Andy Buckley as follow up of his presentation on HepMCv3 at 2nd ECFA workshop on Generators.

The colour flow is the obvious defunct quantity, which has no obvious meaning at detector-level anyway -- I am assuming that EDM4HEP is not attempting to replicate the generator-level event graph, as it doesn't seem to have all the required extra for that. Are you aware of anything actually trying to use colour flows at the EDM level? It would only make sense for afterburner generators, and even then relies on specific internal meanings assigned by the original generator: I know of no generator codes, and certainly no post-generation ones, that attempt to implement afterburner corrections for non-colour-singlet states.

The other points that I had in mind were more comments or initial reactions from the MC perspective on:

spin: similar to the color flow, what is the use-case for this? I believe passing spins for afterburner decays properly requires a spinor basis to be propagated, as well as the vector: this has caused many problems, as existing standards don't propagate enough information and guesses need to be made. Is there detector physics that can use a spin information, even without this extra basis info?
float charge: I'm accustomed to the integer 3-charge representation in HepPID and MCUtils. Certainly it makes sense for MC-gen level, to handle quark states. But more generally, why store the charge when there is a standard scheme and decoding software to extract the charge from the PDG ID code? (True fractional-charge exotic particles are very niche.)
double mass vs float-valued 3-momentum vs double-valued positions: is there a rationale for this? I note that float in the Cartesian representation turned out to not be precise enough for LHC forward physics; not sure if it would be sufficient for the same at e.g. a future ep machine. Also worth thinking about whether an eta-phi fundamental representation could make more sense... was this already done?

Zehvogel commented 1 year ago

From a quick search it looks like if we want to stay able to run the MarlinReco/TrueJet processor we need the colour flow.

Maybe @MikaelBerggren can comment.

gganis commented 1 year ago

Ok, it would not apply to data but it would be used in the interpretation of them. We need to understand how these needs are handled in other cases, e.g. HepMCv3.

gaede commented 1 year ago

Concerning the use of float for the momentum: this really should be double (for all kinematic quantities). Probably lost in translation from LCIO (https://github.com/iLCSoft/LCIO/blob/master/src/cpp/include/pre-generated/EVENT/MCParticle.h#L145-L173) somewhere. Concerning spin and color flow: these are written by Whizard - so they must have some idea on how this is to be interpreted ...

tmadlener commented 1 year ago

I suppose (part of) the rationale for having float momenta in the MCParticle was that LCIO only has double in memory and uses float for I/O. This also seems to be the case for other types.

gaede commented 1 year ago

Ah, yes. Had forgotten about this peculiarity. I believe the argument was sth. like, we don't necessarily need double precision on the file but in memory for (potentially repeated) computations of 4-vectors, masses combining particles.
We should clarify but my feeling would having p as double is more consistent here....

MikaelBerggren commented 1 year ago

On the colour information: Yes it is used in TrueJet, to figure out the di-jet grouping in 4-or-more quark events Often this is straight-forward (two quarks with a common boson mother, only one grouping possible that doesn't imply FCNC, ...), but eg. for a uudd event, without explicit Z or W parents, the colour-connection is used. I think that this is only done exactly at the junction between the M.E. generator and the P.S. code part of the event-record, i.e. exactly at the point where the LesHouches event-record transfers such information between the components of the event generation. The colour-flow within the P.S. is not needed, I'm pretty sure.

dirkzerwas commented 5 months ago

Hi,

we looked at the mapping between HEPMC3 and EDM4HEP. Attached as EDM Comparison.pdf is a rough comparison of the two structures with some comments and thoughts.

One point is that we like the idea of EDM4HEP to provide "const" variables for string inputs because it allows to detect typos in retrieval and writing at the compilation stage.

Our "main" suggestion is to define the structures (see the pdf) and impose them in order to ensure compatibility for the users.

The HepMC authors seem open to the possibility of providing writing to EDM4HEP via their plugin mecanism. This could be an interesting way of providing all MonteCarlos a direct way of writing to EDM4HEP.

In the issue we saw some comments:

Color-Flow

HepMC can contain color-flow information as particle attributes. However, it usage in Monte-Carlo tools is not too consistent between Hepmc3 & Hepmc2 (VectorInt in v3 and two separate ints in v2). Color-flow has “ no obvious meaning at detector-level anyway” but implementing it into EDM4hep may have advantages. For example, the color-flow information would be needed for hadronization. I could imagine a scenario where Sherpa is used for hard process and parton showering but the hadronization could be handled by pythia. Here it would be sensible for sherpa’s Hepmc file to contain the color-flow. -> in the k4GeneratorsConfig convertHepMC2EDM4HEP we implemented the filling of the colorflow in EDM4HEP with a try of the HepMC3 structure and falling back to HepMC2 if that is not available.

Polarisation Information

HepMC does provide some attributes for theta and phi as particle attributes, but I suspect this is for a polarization vector of final states. Looking at a polarized beams file (LCIO) thanks to @gaede, we saw that only the final state taus had their spin information filled. For polarized beams I would suggest to HEPMC to have a dedicated pol_info with at least the % of polarization in each beam. It would be great if similar approach is used by EDM4HEP and HEPMC.

Beamstrahlung

It would be useful to include the model name (hopefully a naming scheme can be decided) that would be given to hepmc as run-info as ToolInfo (either extended to include the model in addition to name and version or using the description field for this purpose) and passed to EDM4HEP. A common approach HEPMC and EDM4HEP would be beneficial. (E.g. EDM4HEP providing an equivalent version of the ToolInfo object).

Alan and Dirk

tmadlener commented 5 months ago

At yesterdays EDM4hep meeting (see the attached notes there for more details) we have concluded to:

Introduce a (or a set of) dedicated datatype(s) for containing as much of the generator meta data as possible -> #309
Keep the spin and color flow data in the MCParticle. This will likely be empty / initialized with some default values for most cases, but the compression of the I/O backend should be able to deal with that quite well.
Keep the information about position / vertices and kinematics together in the MCParticle to have one entry point into generator studies. This will again lead to some potential duplication of information, but we can avoid some cross referencing that would otherwise be necessary.
(We have already switched the vector quantities for positions and kinematics to use double precision in #237)

andresailer commented 5 months ago

Just a thought about the MCParticle Vertex/Endpoint information, which was mentioned in the EDM4hep meeting.

After the Geant4 Simulation the vertex and endpoint information of particles is updated to align with what happens during the simulation, for example deflection due to the magnetic field. It also happens that we don't update endpoints because we don't simulate certain particles for reason (see some tickets in DD4hep about this, e.g.). Thus fiddling the MC tree back into Particle/Vertex separation would probably be some effort, and maybe more confusing than what we see in the MC record now.

tmadlener commented 5 months ago

Closing this, since we are happy with our current implementation of the MCParticle.

key4hep / EDM4hep

Andy Buckley's comments on EDM4hep MC structures #208