Long unmatched mkFit tracks with unfindable CMSSW tracks

kmcdermo commented 7 years ago

As discussed on Friday, I pointed that there was this strange class of mkFit tracks that even when using "pure CMSSW seeds", showed the following properties:

nFoundHits > 20 (including the seed)
fracHitsMatched < 5% (only counting hits after the seed)
CMSSW track is labeled as "unfindable" (which at the time meant that the CMSSW track failed either nUniqueLayers < 8 OR pT <0.5)

As a reminder, "pure CMSSW seeds" means that I am only using the CMSSW seeds that produced a CMSSW track. A plot of these weirdo tracks is here (just a copy from the slides from 25/08/17, slide 15, bottom right, ttbar+noPU CE): ce_ttbar_nopu_badtracks

I updated the unfindability criteria based on the discussion on Friday to be:

CMSSW track has nUniqueLayers < 8
CMSSW track has its last hit position in the transition region (i.e. 0.9 < |eta| < 1.7)

I then reran the text file dump, which is attached here: cmssw2mkfitdump.txt

The selection for entering the dumper is listed at the top of the text file:

fracHitsMatched < 0.1
nFoundHits > 20
mcTrackID >= 0 [ensures we can dump the mcTrack info and that seed is based on a real sim track]
cmsswmask_build < 0 [with the selection already, this ensures the underlying CMSSW track is unfindable, i.e. cmsswTrackID == -7]

Upon inspecting the file, the first mkFit track actually finds all of the sim track hits, while the CMSSW track dies just one layer after the seed. However, the mkFit track continues plowing through, and is picking up hits which have an mcTrackID = -1 (which, if I understand how the binary file does the mcTrackID assignment for hits, means that there was no mcTrack saved for this hit and is likely a pileup track).

However, looking at the rest of the ten mkFit tracks in this dump, seven of them end up getting >= 19 hits matched to the correct sim track, while the CMSSW track dies early. Another two mkFit tracks end up tracing another single mcTrack after their seed.

So, this is good news. Now, I personally believe we should still leave these tracks out of the numerator and denominator of the fake rate when we are comparing to CMSSW tracks. When comparing straight to sim tracks, we should (and already do) add them back into the efficiency and fake rate.

kmcdermo commented 7 years ago

Oh, I meant to comment: if you are interested in reproducing this, checkout my branch:

kmcdermo/fdt-cmsswtruth

Then compile: make -j 12 WITH_ROOT=yes

and run: ./mkFit/mkFit --cmssw-seeds --geom CMS-2017 --cmssw-val --ext-rec-tracks --read --file-name /data/nfsmic/slava77/samples/2017/pass-4874f28/initialStep/10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFullINPUT+DigiFull_2017+RecoFull_2017+ALCAFull_2017+HARVESTFull_2017/memoryFile.fv3.recT.072617.bin --build-ce --num-thr 24

slava77 commented 7 years ago

Hi Kevin

On 8/27/17 6:43 AM, Kevin McDermott wrote:

Upon inspecting the file, the first mkFit track actually finds all of the sim track hits, while the CMSSW track dies just one layer after the seed. However, the mkFit track continues plowing through, and is picking up hits which have an mcTrackID = -1 (which, if I understand how the binary file does the mcTrackID assignment for hits, means that there was no mcTrack saved for this hit and is likely a pileup track).

Based on the text dump, I see that for mkFit

the first hit is BPIX1 belongs to mc track 768 (no details are printed)
the next 3 hits are FPIX1,2,3 and belong to mc track 769 (a 7-hit sim track) .. the track continues through the endcap and from hit positions it looks like it is just about scraping the edge of TEC.

In the printout the mkfit, seed, CMSSW, and sim track states are all the same. Perhaps a typo? The px,py,pz printed has pz 0.3 and px 2.9, which seem like 1/p values to still make sense as a track in the endcap. Please check and fix and also please print eta.

the mcTrackID of a hit is -1 if there is no associated sim track at the tracking ntuple level and also (added as cleaning during writing) if the sim track has fewer than 3 sim hits or less than 2 associated rec hits. For in-time tracks there is no correlation for the lack of association with the fact if a track is a pileup or not.

However, looking at the rest of the ten mkFit tracks in this dump, seven of them end up getting >= 19 hits matched to the correct sim track, while the CMSSW track dies early. Another two mkFit tracks end up tracing another single mcTrack after their seed.

So, this is good news. Now, I personally believe we should still leave these tracks out of the numerator and denominator of the fake rate when we are comparing to CMSSW tracks. When comparing straight to sim tracks, we should (and already do) add them back into the efficiency and fake rate.

kmcdermo commented 7 years ago

Based on the text dump, I see that for mkFit

the first hit is BPIX1 belongs to mc track 768 (no details are printed)

I could print out all the sim tracks for which the mkFit track picks up... but I though this might be sensory overload.

In the printout the mkfit, seed, CMSSW, and sim track states are all the same. Perhaps a typo?

Gah. Indeed, it was dumping the mkFit track state only. Fixed.

The px,py,pz printed has pz 0.3 and px 2.9, which seem like 1/p values to still make sense as a track in the endcap. Please check and fix and also please print eta.

Ah, this was a holdover from our global cartesian days, so px -> 1/pT, py ->phi, pz -> eta. In any case, I made the text changes to the function to my local branch.

For in-time tracks there is no correlation for the lack of association with the fact if a track is a pileup or not.

Ah okay, I somehow was remembering that if a track came from some premixing vertex, it did not have the full sim information.

I made the changes as suggested, and here is the result: cmssw2mkfitdump.txt

A few things to note. The z-position variance is completely zeroed in all of these mkFit tracks. @cerati Is this intended/an artifact of moving the local 2d plane in the update?

Also, looking at the covariances of the sim and cmssw tracks, the sim tracks have their hit covariances all along the diagonal, while the block of momentum covariance is full. The opposite is true for the cmssw tracks (position block full, momentum block diagonal). Is this intended? This would affect any sort of helix chi2.

slava77 commented 7 years ago

On 8/27/17 9:57 AM, Kevin McDermott wrote:

The px,py,pz printed has pz 0.3 and px 2.9, which seem like 1/p values
to still make sense as a track in the endcap. Please check and fix and
also please print eta.
Ah, this was a holdover from our global cartesian days, so px -> 1/pT, py ->phi, pz -> eta. In any case, I made the text changes to the function to my local branch.

Looking at the first track

mkFit Track (Final) x: 25.4879 y: -46.4372 z: 266.37 1/pT: 2.92161 phi : -2.41247 eta : 0.296351

The momentum values still do not add up: eta of 0.3 can not be correct for a track that starts in BPIX1 and goes through FPIX. Is there some reshuffle left in the printout or is eta actually theta ?

Please still print the inner sim track if it differs from the rest, at least for this debugging session.

kmcdermo commented 7 years ago

Hi Slava,

Ah, you are right. It is theta... I forgot we store theta as parameters[5], not eta. I will add the inner sim track as well.

kmcdermo commented 7 years ago

@slava77

I updated the text file dump to actually print eta in the track state dump (although the covariance matrix is still in its original form, so cov(i,5) is for theta and not eta).

The text file now includes the trackstates and hit info for the leading and subleading mcTracks that are matched if it is different from the simtrack the seed comes from.

please find the text file here: cmssw2mkfitdump.txt

cerati commented 7 years ago

Hi Kevin,

yes, the variance is zero for the z coordinate since the state you print is on a surface perpendicular to z (it should be the same in the barrel for r).

The covariance for the sim track is dummy and corresponds to 100% uncertainty for cartesian coordinates (it is not diagonal because you are using ccs).

I am not sure why the CMSSW seed covariance looks like that, I would have expected it was not structured in blocks (it should be the result of a KF fit over the seed hits). Slava, are we filling also the off-diagonal terms in the ntuple?

Thanks, Giuseppe

From: Kevin McDermott notifications@github.com Sent: Monday, August 28, 2017 4:43:56 AM To: cerati/mictest Cc: Giuseppe B. Cerati; Mention Subject: Re: [cerati/mictest] Long unmatched mkFit tracks with unfindable CMSSW tracks (#101)

@slava77https://github.com/slava77

I updated the text file dump to actually print eta in the track state dump (although the covariance matrix is still in its original form, so cov(i,5) is for theta and not eta).

The text file now includes the trackstates and hit info for the leading and subleading mcTracks that are matched if it is different from the simtrack the seed comes from.

please find the text file here: cmssw2mkfitdump.txthttps://github.com/cerati/mictest/files/1256207/cmssw2mkfitdump.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cerati/mictest/issues/101#issuecomment-325309681, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEmGGg2YgjuPNy8ZW2THHOesg0SxB51-ks5scovbgaJpZM4PDzc6.

slava77 commented 7 years ago

[looks like my response from email client started failing on Monday]

On 8/28/17 6:57 AM, cerati wrote:

I am not sure why the CMSSW seed covariance looks like that, I would have expected it was not structured in blocks (it should be the result of a KF fit over the seed hits). Slava, are we filling also the off-diagonal terms in the ntuple?

only the diagonal elements are filled now.

cerati commented 7 years ago

I think we should also fill the off-diagonal terms (the seed state comes from a KF fit so they should be meaningful).

slava77 commented 7 years ago

The off-diagonal elements are not available in the trackingNtuple now. I derived the diagonal elements from the track parameter errors. It seemed to be enough at the point.

I can update the ntuple to get the rest. Do we have the right jacobians to convert from what CMSSW knows to what our code here knows?

cerati commented 7 years ago

right... it is not as easy as I thought. I think there should be a way to convert from CMSSW to global cartesian and from global cartesian to our system. Let me know if you need help digging.

slava77 commented 7 years ago

Hi Kevin,

I looked at the first case of the long track in https://github.com/cerati/mictest/files/1256207/cmssw2mkfitdump.txt

The behavior of matching appears roughly as expected. Note that the simtrack that matches most of the hits is an electron with pt of 0.55 GeV and p of 2.13 GeV.

The CMSSW track stops in FPIX3, while the MKFit track continues through the gap and has the first hit in TID2. The sim momentum between FPIX3 and TID2 is consistent, but there is a missing layer and even on TID2 there is only one hit. So, this looks like a large enough penalty to kill a track in CMSSW.
Using the tracking ntuple I was able to navigate to hit 24 on this track (which is in TEC8). Here every available sim/rec hit for this simtrack were found on the MKFit
- Sim hit matching is missing in the MIC validation for hits starting from TEC1 to TEC8 because only hits with more than 50% of the initial (simtrack) momentum are saved. The first TEC1 hit for this track has p of 0.94 GeV, which is below the threshold.
- the last hit matched to sim on TEC8 has momentum of 0.52 GeV (pt of 0.13 GeV)
I could not find the provenance of the last two TEC rec hits found on the MIC track. It may still be from the same particle.
- Sim hit provenance is not saved below energy of 0.2 GeV. It could be that this 0.5 GeV electron lost some more to lead to missing provenance.

This is also a good example of why we can not match some low-pt tracks based on the inner sim momentum and the outer mkFit momentum: the KF follows the track even though it may have lost a half of its momentum.

kmcdermo commented 7 years ago

Hi Slava,

Thanks for following up on this. This is a nice example of our tracking working then as expected, and in reality, some tracks are just lost to matching based on what is available to us from CMSSW: from the reco building, from what is stored in the tracking nTuple, and from what CMSSW itself internally saves for sim hits.

I have a few follow-up questions: what is the penalty for lowering the momentum sim hit matching for saving in the tracking ntuple to say, 25%? I know we went this whole ordeal with the garbage hits being stored, but was there some optimization for 50% versus something lower?

What does CMSSW do in the case of a CMSSW reco track finding sim hits which do not have their provenance saved in terms of counting for fake rates? Is there at least some flag from CMSSW that the rec hit associated to the sim hit is from an actual track and not from detector noise or ghost hits?

slava77 commented 7 years ago

Hi Kevin

On 9/13/17 4:26 AM, Kevin McDermott wrote:

Hi Slava,

I have a few follow-up questions: what is the penalty for lowering the momentum sim hit matching for saving in the tracking ntuple to say, 25%? I know we went this whole ordeal with the garbage hits being stored, but was there some optimization for 50% versus something lower?

the penalty is that by adding these hits to sim-tracks we increase the phase space of hits on sim tracks that are expected to have rather poor performance in reconstruction if required to be a part of the sim-to-reco matching. Because of this, I wouldn't want to see hits from simtracks after a large energy loss in the efficiency definitions.

the gain by adding them is to be able to discern cases where some hits on a track are fake or are actually coming from an identifiable sim origin.

So, for one size fits all, I'd rather stay with the 50% in order to not pollute the "good matching" side.

Remind me please what will actually change for this mkFit track for the current situation, should more rechits get more matching simhit information. IIUC, it is going to affect only the debugging information related to its provenance, which is not a part of the definition of the matching procedure.

What does CMSSW do in the case of a CMSSW reco track finding sim hits which do not have their provenance saved in terms of counting for fake rates? Is there at least some flag from CMSSW that the rec hit associated to the sim hit is from an actual track and not from detector noise or ghost hits?

Just to clarify: the 50% selection on the hit momentum is from me, when the bin file for mkFit validation is written. There are other (lower) thresholds in sim provenance tracking set for geant and for sim-to-digi simlinks as well as for tracking particles themselves. Once this is lost, the rechits on the tracks can not be matched. In the tracking ntuple these hits are marked as noise, but perhaps better ways are available at full sim level.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cerati/mictest/issues/101#issuecomment-329138574, or mute the thread https://github.com/notifications/unsubscribe-auth/AEdcbnwly6fm0CMb-mtdpvp9GVXVFk0iks5sh7vQgaJpZM4PDzc6.

kmcdermo commented 6 years ago

We can close this issue as we found that while CMSSW stops short, we go on and get the correct sim tracks.

These tracks are excluded from validation anyways.

trackreco / mkFit

Long unmatched mkFit tracks with unfindable CMSSW tracks #101