iLCSoft / MarlinReco

GNU General Public License v3.0
4 stars 40 forks source link

Make the TrueJet processor use the PIDHandler to set the ParticleIDs #132

Closed tmadlener closed 4 months ago

tmadlener commented 6 months ago

BEGINRELEASENOTES

ENDRELEASENOTES

Fixes key4hep/k4EDM4hep2LcioConv#62

@Zehvogel can you check whether this fixes things on your end? At least from a technical level, the algorithm names might still need some tuning, but I would also be interested to see which ones are actually hit / converted.

Zehvogel commented 6 months ago

Yes I will test it. It would probably also be good if someone who uses the MarlinUtil TrueJet parser can test if that still works with this change. Maybe someone at DESY? :)

tmadlener commented 6 months ago

Thanks. In principle nothing should have changed other than that we are now writing some additional metadata as well. I will check if that is actually true.

Zehvogel commented 6 months ago

This fixes running TrueJet for me. I still need to adapt my RDataframe code to work on the new direction though. However I compared the podio-dump output after this change and before reversal of the PID collection and it looks like everything is correct. Depending on what kind of events one uses a lot of the PIDs will stay empty.

As far as I can tell *_PID_TrueJet_*_jet_n is filled for the n-th jet belonging to the initial/final colour neutral. I have no idea how this will ever reach values much greater than two... maybe @MikaelBerggren can comment if we really need 25? Also the 0'th one is never filled? The files become really crowded at the moment with all of them in there...

I attached the above-mentioned podio-dump outputs if anyone wants to take a look :)

truejet_oldpid.txt truejet_newpid.txt

tmadlener commented 6 months ago

Also the 0'th one is never filled? The files become really crowded at the moment with all of them in there...

I am not sure if this can even be checked easily, but were these filled before?

Zehvogel commented 6 months ago

Also the 0'th one is never filled? The files become really crowded at the moment with all of them in there...

I am not sure if this can even be checked easily, but were these filled before?

I think that the 0'th one was not filled was expected if you start the counting of the jets per ICN by 1. There is no ICN/FCN with no truejet belonging to it. So that one seems to be simply left over.

For the values > 2 I am not so sure. I think 3 and 4 are used relatively often (as soon as you have gluon-splitting etc.).

Leaving my area of expertise even more I would guess that a final colour neutral would by definition always result in two (true)jets while an initial colour neutral can have $2n$ (true)jets where $n$ is the number of final colour neutrals it hadronises into.

So maybe we can at least reduce the amount of fafpf PIDs to two (three including the main one)?

tmadlener commented 5 months ago

After merging #134, we have almost halfed the number of ParticleID objects that will be visible once this has been converted to EDM4hep. For the initial state side there is no generally applicable way to do this. The number 25 has been chosen to work at ILD up to 1 TeV. At lower energies this might be smaller. I think the easiest way of dealing with this might be to simply drop the "superfluous" collections via the KeepDropSwitch and configuration (untested) along the lines of

# ...
"drop *_PID_TrueJet_fafpi_jet1?",
"drop *_PID_TrueJet_fafpi_jet2?",
# and then maybe some single digits as well if you still want to get rid of more
# or alternatively drop all of them and keep only the ones you want, might be the same number of lines
Zehvogel commented 5 months ago

While you are in there anyway, do you mind setting https://github.com/iLCSoft/MarlinReco/blob/8c3f3b614395d31dbaf85b09491d1307226e5d26/Analysis/TrueJet/src/TrueJet.cc#L157 to something like MESSAGE?