iLCSoft / MarlinReco

GNU General Public License v3.0
4 stars 40 forks source link

TOFEstimators seg. faults when fit of the first track chunck fails #102

Closed dudarboh closed 2 years ago

dudarboh commented 2 years ago

Issue

As spotted by Uli (@ueinhaus) TOFEstimators processor occasionally leads to seg. faults in ~0.001-0.01 of cases.

Reproducible

Minimal steering file reproducible to recreate the seg fault:

<marlin>
 <constants>
 </constants>

 <execute>
  <processor name="InitDD4hep" />
  <processor name="MyTOFEstimator_0ps"/>
 </execute>

 <global>
  <parameter name="LCIOInputFiles">
      /pnfs/desy.de/ilc/prod/ilc/mc-opt-3/ild/rec/500-TDR_ws/2f_Z_hadronic/ILD_l5_o1_v02/v02-00-01/00010410/000/rv02-00-01.sv02-00-01.mILD_l5_o1_v02.E500-TDR_ws.I250114.P2f_z_h.eL.pR.n027_149.d_rec_00010410_937.slcio
  </parameter>
  <parameter name="MaxRecordNumber" value="0" />  
  <parameter name="SkipNEvents" value="15" />  
  <parameter name="SupressCheck" value="false" />  
  <parameter name="AllowToModifyEvent" value="true" />
 </global>

  <processor name="InitDD4hep" type="InitializeDD4hep">
    <parameter name="DD4hepXMLFile" type="string"> /cvmfs/ilc.desy.de/sw/x86_64_gcc49_sl6/v02-00-01/lcgeo/v00-16-01/ILD/compact/ILD_l5_o1_v02/ILD_l5_o1_v02.xml </parameter>
  </processor>

  <processor name="MyTOFEstimator_0ps" type="TOFEstimators">
  </processor>

</marlin>

Reason for seg. fault

We sort subTrack hits in the direction of the particle movement. To understand this direction (which hit is first/last) we rely on the last hit of the previous iteration prevHit here:

If the fit of the very first subTrack (i == 0) fails, then it is simply skipped with continue without pushing any hits into the array which creates a seg. fault on the next iteration as trackStatesPerHit is empty and we try to access prevHit as trackStatesPerHit.back()

https://github.com/iLCSoft/MarlinReco/blob/c7957ec0ab4633eddf545e106f3e03786300c068/TimeOfFlight/src/TOFUtils.cc#L198-L220

Event example

  1. On the left: A single PFO creating seg. fault, because first (blue) subtrack fails to fit with MarlinTrk::createFinalisedLCIOTrack().
  2. On the right: All tracker hits recorded in the event. Purple hand-drowing highlighting hits relevant the the given MCParticle

drawingdrawing

  1. PandoraPFA fails here to assign tracker hits properly, so the hits in the first curl coming from the IP are not attached to the PFO.
  2. For the 1st reconstructed subtrack we always assume fit direction decreasing in R, which should mean to the IP. Which is not the case here, due to the missed half curl by Pandora, thus fit fails.

Solution

First thing to do is probably to count number of succeeded fits, instead of using i:

int nPassedSubTracks = 0;

// . . .

int errorFit = MarlinTrk::createFinalisedLCIOTrack(marlinTrk.get(), hits, &refittedTrack, IMarlinTrack::backward, &preFit, bField, maxChi2PerHit); 
 if (errorFit != 0) continue; 

nPassedSubTracks++;

 vector< pair<TrackerHit*, double> > hitsInFit; 
 marlinTrk->getHitsInFit(hitsInFit); 

 int nHitsInFit = hitsInFit.size(); 

 // if first subTrack 
 if (nPassedSubTracks == 1){ 
     trackStatesPerHit.push_back(*(static_cast<const TrackStateImpl*> (refittedTrack.getTrackState(TrackState::AtIP)) )); 

     //add hits in increasing rho for the FIRST subTrack!!!!! 
     for( int j=nHitsInFit-1; j>=0; --j ){ 
         TrackStateImpl ts = getTrackStateAtHit(marlinTrk.get(), hitsInFit[j].first); 
         trackStatesPerHit.push_back(ts); 
     } 
 } 
 else{ 
     // check which hit is closer to the last hit of previous fit. 
     // and iterate starting from the closest 
     Vector3D innerHit ( hitsInFit.back().first->getPosition() ); 
     Vector3D outerHit ( hitsInFit.front().first->getPosition() ); 
     Vector3D prevHit ( trackStatesPerHit.back().getReferencePoint() ); 

then we are getting rid of seg. fault.

The other reasonable thing to do is to try to fit forward if backward fit fails..

I will create PR implementing these changes in the following days.

@tmadlener @gaede

tmadlener commented 2 years ago

Hi @dudarboh,

Just for clarification: The issue you describe does not happen inside the TOFUtils::getTrackStatesPerHit function, but in the function that calls it?

If I understood correctly, your PR will in the end do something like the following:

In the last point here, "handle this properly" means to take the first sub track for which the fit succeeds as the new first sub track?

dudarboh commented 2 years ago

@tmadlener No, the issue does happen inside TOFUtils::getTrackStatesPerHit().

Because effectively code looks like this:

vector<int> results;
for (int i=0; i<10; ++i){
    //assume first subTrack fit fails and we just continue
    if (i == 0) continue;

    // rest of the code
    if (i == 0){
        results.push_back(i);
    }
    else{
        int getSegFault = results.back();
    }

}

Your PR description is correct.

However, I am still not sure if adding "try fit in forward direction" does make sense and if it will make a fit work when it fails backwards, so I want to try and see the results first.

Now when I think of this, fit direction should not matter at all for the fit result of the helix.. (if there is no hidden assumption for the backward fit that it should go close to the IP..)

tmadlener commented 2 years ago

Ah yes, right. Saw the .back() just now. Thanks for pointing it out again a bit more clearly.

I am also not sure whether we should expect a different result from a forward fit compared to a backward fit. Maybe @gaede can comment.

gaede commented 2 years ago

The difference is the assumed direction of flight wrt. to the order of the hits and the corresponding energy loss. So yes, there is a (small) difference between the two...

dudarboh commented 2 years ago

Fit in the "forward" direction does result in the successful fit.

DEBUG information:

Conclusion

Fit backward -- fail:

[ DEBUG2 "MyTOFEstimator_0ps"] MarlinDDKalTestTrack::fit() called 

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1546 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 8.982098e-04 _maxDeltaChi2 = 1.000000e+02

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1545 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 9.297351e+01 _maxDeltaChi2 = 1.000000e+02

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1544 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 2.601180e+02 _maxDeltaChi2 = 1.000000e+02
[ DEBUG2 "MyTOFEstimator_0ps"] Kaltrack::addAndFit : site discarded! at index : 1544 for type DDCylinderMeasL chi2increment = 2.601180e+02 maxChi2Increment = 1.000000e+02 x = 1.899615e+02 y = -5.850322e+02 z = 2.354461e+02 with CellIDs: 

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1543 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 5.845538e+02 _maxDeltaChi2 = 1.000000e+02
[ DEBUG2 "MyTOFEstimator_0ps"] Kaltrack::addAndFit : site discarded! at index : 1543 for type DDCylinderMeasL chi2increment = 5.845538e+02 maxChi2Increment = 1.000000e+02 x = 1.997436e+02 y = -5.754175e+02 z = 2.388288e+02 with CellIDs: 

...

Fit forward -- success:

[ DEBUG2 "MyTOFEstimator_0ps"] MarlinDDKalTestTrack::fit() called 

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 640 for type DDPlanarMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 9.278441e-05 _maxDeltaChi2 = 1.000000e+02

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 841 for type DDPlanarMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 5.869883e-06 _maxDeltaChi2 = 1.000000e+02

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1107 for type DDPlanarMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 2.271298e+00 _maxDeltaChi2 = 1.000000e+02

[ DEBUG2 "MyTOFEstimator_0ps"] >>>>>>>>>>>  Fit is now constrained at : subdet:2,side:0,layer:3,module:17,sensor:10 pos   ( 2.307775e+02, -1.959975e+02, 3.300815e+02 ) -  [ phi: -7.040831e-01 , rho: 3.027759e+02 ]   [ theta: 7.422784e-01 , r: 4.479141e+02 ]  trkhit = 0x170cd480 index of kalhit = 3 NDF = 1

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1504 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 8.870811e-01 _maxDeltaChi2 = 1.000000e+02

... just skipping intermediate hits

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1545 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 3.564592e+01 _maxDeltaChi2 = 1.000000e+02

[ DEBUG1 "MyTOFEstimator_0ps"] Kaltrack::addAndFit :  add site to track at index : 1546 for type DDCylinderMeasL
[ DEBUG1 "MyTOFEstimator_0ps"]  KalTrackFilter::IsAccepted called  !  deltaChi2 = 9.079321e+01 _maxDeltaChi2 = 1.000000e+02

bad_fit_example