Pandora sometimes misses obvious-looking track cluster associations

Zehvogel commented 3 months ago

PandoraPFOs (clusters + associated tracks)

PandoraPFOs + all SiTracks

TODO:

[ ] Check if this looks the same for SiTracks_Refitted

json and podio-dump output for the event pi+.10000.REC.edm4hep.json pi+.10000.REC.dump.txt

Zehvogel commented 3 months ago

Further links for Pandora calibration/performance analysis:

maybe there exists an already developed tool that can be used to debug this...

Otherwise I would suggest to create an algorithm that takes the tracks and clusters belonging together based on truth information and checks the same conditions as https://github.com/PandoraPFA/LCContent/blob/master/src/LCTrackClusterAssociation/TrackClusterAssociationAlgorithm.cc to see where this goes wrong.

Another thing the check then would be if the tracks are even considered in the track cluster association or if there is another cut on them before.: https://github.com/iLCSoft/DDMarlinPandora/blob/master/src/DDTrackCreatorCLIC.cc#L379

saracreates commented 3 months ago

Percentages in the plot show how many MC muon (charged hadrons) are reconstructed as a PFO with track, as a PFO without a track (neutral) and not reconstructed as PFO at all (loss). Muons have higher PFO track efficiency than charged hadrons.

Comparison of muons (don’t use PFO) vs. charged hadrons (use PFO) suggests that tracks might be reconstructed but not assigned to a PFO.

Overall low PFO track efficiency which doesn't match the expected track efficiency suggests the same: Hbb: 74.22 % Hgg: 77.34 % H𝜏𝜏: 91.46 %

Hbb/Hgg/H𝜏𝜏 charged particles which are reconstructed as neutrals (= unassigned track): 9.13% / 6.60% / 6.16%

andresailer commented 3 months ago

Hi @Zehvogel

json and podio-dump output for the event

Do you have the podio input file for the reco and the event number for it as well?

maybe there exists an already developed tool that can be used to debug this...

There are pandora algorithms to dump cluster / track / hits that just need to be added or enabled in the PandoraXML file. And of course PandoraMonitoring to look at things as Pandora reconstructs them.

Zehvogel commented 3 months ago

@andresailer I shared the file with you on eos, you should have received an email :)

It should be event 769.

In this case the other particles are secondaries created by a single pion in the tracker and I only chose that one because I noticed it while looking at event displays for another reason. For proper debugging we should find some events in Z->qq or Z->tautau where this happens I think...

andresailer commented 3 months ago

Do you have the input file (the SIM file), as well?

Zehvogel commented 3 months ago

I think so. You should have gotten another mail now.

andresailer commented 3 months ago

I run reconstruction myself from the SIM file, and I think the PFO dump from Pandora with some MC comparison is suffering from https://github.com/key4hep/CLDConfig/issues/48 so the output is not as useful as it could be.

@Zehvogel How did you run the simulation? this is particle gun with pions with fixed energy, or variable energy?

Zehvogel commented 3 months ago

COMPACT_FILE=$K4GEO/FCCee/CLD/compact/CLD_o2_v06/CLD_o2_v06.xml
PARTICLE=pi+
NEVT=10000

ddsim --compactFile $COMPACT_FILE \
      --outputFile $PARTICLE.$NEVT.SIM.edm4hep.root \
      --steeringFile $CLDCONFIG/share/CLDConfig/cld_steer.py \
      --numberOfEvents $NEVT \
      --enableGun \
      --gun.particle=$PARTICLE \
      --gun.distribution=uniform  \
      --gun.momentumMin=1*GeV \
      --gun.momentumMax=50*GeV \
      --crossingAngleBoost=0

andresailer commented 3 months ago

Hi @saracreates

Percentages in the plot show how many MC muon (charged hadrons) are reconstructed as a PFO with track, as a PFO without a track (neutral) and not reconstructed as PFO at all (loss). Muons have higher PFO track efficiency than charged hadrons.

Is there some lower boundary on the charged hadron lifetime / number of hits created? I.e. short lived particles that don't leave a reconstructable track are charged hadrons, but are harder to reconstruct as such.

Comparison of muons (don’t use PFO) vs. charged hadrons (use PFO) suggests that tracks might be reconstructed but not assigned to a PFO.

How do you select the entries? What criteria are applied? What does "don't use PFO" mean? Both muons and charged hadrons should have a track?

Overall low PFO track efficiency which doesn't match the expected track efficiency suggests the same: Hbb: 74.22 % Hgg: 77.34 % H𝜏𝜏: 91.46 %

What does "PFO track efficiency" mean?

saracreates commented 1 month ago

Hi @andresailer

Sorry for the late reply, I haven't seen your comment!

1) For these plots, I haven't used lower boundary expect of checking that the MC particles are stable (generatorstatus =1). I redid the plots in a slightly different way: Here two things are changed:

I apply a cut of at least 4 tracker hits associated to the MC particles, otherwise I discard the event (that is why there are nearly no losses anymore)
I lowered the threshold of tracks being associated to a pfo. At least threshold percent of the reco pfo needs to be associated to the MC particle otherwise I mark it as lost. Then I check that there is at least threshold percent of track weight (so at least 30% of MC track hits must be associated to the pfo) associated with this pfo. If not, I mark it as neural, if yes, then it has a track. I have lowered this threshold from 50% to 30%.

I hope this helps.

2) "don't use PFO" means that when I access muons, basically all muon tracks are just convered to pfos (to my understanding, please correct me if I'm wrong). For charged hadrons it's different. There are certain criteria that are applied before selecting tracks & clusters to match and form a pfo which is then my "charged hadron. E.g. above 5 GeV a track only is not enough to reconstruct a charged particle only via track. It requires a cluster in the calo, otherwise the track is rejected and not selected to form a pfo. This leads then to neutrals which are actually charged particles. Look at the dip in the performance here: (charged hadrons from H-> bb)

3) "PFO track efficiency" means "how many tracks/MC charged particles end up being used as charged pfos". I explained how I check this in 1).

Let me know if anything else is unclear!

key4hep / CLDConfig

Pandora sometimes misses obvious-looking track cluster associations #43