umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

ntuple generated with updated script doesn't agree with Phoebe's original ntuple #16

Closed yipengsun closed 4 years ago

yipengsun commented 5 years ago

We are comparing:

It seems that the muon momentum muplus_P and all its components have a ~5% deviation from Phoebe's. We need to investigate why.

yipengsun commented 5 years ago

@CoffeeIntoScience was able to adapt our ntuple_options.py to run on DaVinci/v36r1p2. An interesting thing is that her ntuple now contains only 34 entries, whereas ours contains 243 entries.

yipengsun commented 5 years ago

I've updated 2 more comparisons here: https://umd-lhcb.github.io/lhcb-ntuples-gen/data/comparison_run1_RDst/comparison/

I'd say the the ntuples generated by the same script with different DaVinci's look much more similar. This means that there's some bug in my DaVinci script in the sense that my script doesn't fully replicate @CoffeeIntoScience's functionality.

yipengsun commented 5 years ago

The main difference between ntuples generated by DaVinci/v36r1p2 and DaVinci/v42r8p1 seems to be in the isolation variables.

Phoebe suggested that this is likely due to the way the NoPID pions is set up. This can be checked by looking into Y_ISOLATION_Type variables.

yipengsun commented 5 years ago

@CoffeeIntoScience I've updated the comparison between the following variables generated by DaVinci/v36r1p2 and DaVinci/v42r8p1:

Y_ISOLATION_Type
Y_ISOLATION_Type2
Y_ISOLATION_Type3
Y_ISOLATION_Type4

The comparison can be found here.

CoffeeIntoScience commented 5 years ago

this is very curious. For your education:

track type 1 is VELO only, with no other compatible segments found, track type 3 is Forward tracks, which are VELO+(TT)+downstream segment (TT hits optional) track type 4 is Upstream, VELO+TT

what's not obvious to me at all is whether what we're looking at is your version picking up MORE velo tracks, or FEWER other types. Like are we looking at the VELO tracks filling up a deficit or are we looking at VELO tracks being shifted to higher BDT values by some other change in the tracking or fitting?

CoffeeIntoScience commented 5 years ago

I would suggest, as a next step in this thread, doing this comparison and the comparison of BDT scores again excluding VELO tracks and seeing if we get agreement then

yipengsun commented 5 years ago

By excluding VELO tracks, do you mean commenting out the following code: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/a2caa86e3cb196567f9d6738ff9915911528511f/2012-b2D0MuXB2DMuNuForTauMuLine/reco_Dst.py#L34-L43

CoffeeIntoScience commented 5 years ago

Easier would be commenting out the one line https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/a2caa86e3cb196567f9d6738ff9915911528511f/2012-b2D0MuXB2DMuNuForTauMuLine/reco_Dst.py#L59

yipengsun commented 5 years ago

OK then. I'm about to generate the new ntuple with that line commented. Could you re-run my script with DaVinci/v36r1p2? You probably need the following files:

2012-b2D0MuXB2DMuNuForTauMuLine/reco_Dst.py
2012-b2D0MuXB2DMuNuForTauMuLine/conds/cond-data-Dst.py

The raw data files should be placed in data folder at the same level of reco_Dst.py, the generated ntuples will be placed at the same folder of the reco_Dst.py so there's no need to create gen folder.

I'm invoking DaVinci in the following way:

lb-run DaVinci/v42r8p1 ./conds/cond-data-Dst.py reco_Dst.py
yipengsun commented 5 years ago

Now I only see track type 3 and 4, as expected.

yipengsun commented 5 years ago

@CoffeeIntoScience I've updated ntuples generated with your TupleToolApplyIsolation. There's no obvious improvement.

yipengsun commented 5 years ago

Before cutting on the Y_ISOLATION_BDT variables, for 2012 Mag Down data:

The difference (= ABS(Ours-Phoebe's)/Phoebe's) is about 10%.

After the Y_ISOLATION_BDT < 0.15 cut:

The difference is about 14%.

yipengsun commented 5 years ago

To disable DecayTreeFitter, I should replace the TopSelection option with sel_B0? https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/22b34a8e1851efd1d9cd9cade9bc6fb515f9325e/run1-b2D0MuXB2DMuNuForTauMuLine/reco_Dst.py#L419-L435

Also, I start to remember things, and noticed this line: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/22b34a8e1851efd1d9cd9cade9bc6fb515f9325e/run1-b2D0MuXB2DMuNuForTauMuLine/reco_Dst.py#L226-L227

I think I'll uncomment that for better backward compatibility.

yipengsun commented 5 years ago

Weird. The generated .root file cannot be opened by root.

Turns out my root installation is messed up.

yipengsun commented 5 years ago

@CoffeeIntoScience I've created a script for older version of DaVinci. You can download it from here. It should be able to run on DaVinci/v36r1p2 without any editing.

yipengsun commented 5 years ago

@CoffeeIntoScience @manuelfs I've updated ntuples generated with DaVinci/v36r1p2 and DaVinci/v42r8p1 without final refitting. The results are available here.

Now the momentum of the Kaons and Pions fully agree, but the momentum of D0 still differs. This makes sense---D0 is reconstructed by CombineParticle, which uses some fitting tool.

Actually, the isolation variables now have larger differences.

CoffeeIntoScience commented 5 years ago

Can you rerun your script (just over like 300 events is fine) with the output level set to DEBUG and send me the log? (This used to be done via from Gaudi.Configuration import * followed by MessageSvc().OutputLevel=DEBUG)

yipengsun commented 5 years ago

The debug option you provided still works. I reran my script over exact 300 events with debug option enabled, and the log file is ~60 MB. The log file can be found here.

yipengsun commented 5 years ago

DaVinci change logs regarding fit

Starting from DaVinci/v36r1p2, with Phys/v19r8. End at DaVinci/v42r8p1, with Phys/v23r9p1.

version log
v36r6 In Phys/Tesla v1r4: Support refitted PV persistence.
v37r2 Include as patch packages RelatedInfoTools, DaVinciTools, LoKiCore, LoKiPhys, LoKiAlgo and GaudiConf
v40r1 Added TrackSys to used configurables. It configures the tracking related tools used by LoKi::VertexFitter and DistanceCalculator
v42r4 Allow to add selections directly to the list of UserAlgorithms. No need in the explicit appearence of SelectionSequence

Phys change logs regarding fit

version log
v19r2 ParticleClassificator: Add merged (=="stable") pi0 into list of "di-gamma-like" particles.
v19r3 VertexFit, VertexFitWithTracks, FastVertex: replace all matrix inversions by faster method based on Cholesky's decomposition.
v19r4 Particle2State: use vectorized similarity transform from LHCbMath -- which is (in this case) 4.8x faster
v19r5 KalmanFilter: Use cholesky decompositon for inversion of positively definite covariance matrices.
v19r6
  • DecayTreeFitter
    • Bug fix in dealing with bremstralung in RecoTrack.cpp.
    • Added new class ConvertedPhoton that implements 'soft' constraint on conversion vertex.
  • KalmanFilter
    • Added new function that allows significant optimization of vertex fit.
  • LoKiFitters
    • VertexFitter: added option to allow CPU-performace measurement, "MeasureCPUPerformace". Implemented optimized algorithm (but not yet fully tested, further optimization is still possible and optimized). Tests with B&W WG_Selection show factor of 30-35% improvement in CPU for the vertex fit optimized algorithm is used when new option "UseOptimizedAlgorithm" is activated.
    • Activated optimized algorithm for default setting.
  • VertexFit: Fixed typo in forward reference of PVReFitterAlg.h
v19r7 KalmanFilter: Added IParticleClassifier interface needed for bypassing full vertex fit.
v20r0
  • LokiFitters
    • Make minimum number of tracks in PV configurable; set default to 4, compare tracks by ID by default (all in PVRefitter.cpp).
    • PVReFitter.cpp: added new option CheckAncestors (default value is true). If activated, the track is considered to be from PV, is any of its "ancestors" in PV. This could be very fragile and depends on ancestors are properly filled/stored...Tests and feedback from experts (both online and offline) is needed here.
    • VertexFitter: Removed unnesessary branches. Added treatment of jets according to Wouter's needs.
    • Distance calculator: make IP,chi2(IP), DOCA,chi2(DOCA) for gammas and digammas always zero.
    • Particle classificator: added notion of jet-like particle.
    • VertexFit: _load: added error message to debug misterious problem reported by Julian.
v20r1
  • LoKiFitters
    • ParticleClassifier: property added to switch of special treatment of jets.
    • PVReFitter: Silence warning about too few tracks after refit.
v20r3
  • KalmanFilter: Fix nDoF for "rho+"-like particles was set to zero...formally should be -1.
  • Phys/LoKiFitters: DistanceCalculator, "track" interface: for long-tracks in vicinity of origin point use ITrackStateProvider::stateFromTrajectory instead ofITrackStateProvider::state. It is expected to be more CPU efficient.
v21r0
  • KalmanFilter
    • Introduces new type of particles "rho+-like" 1) Added corresponding enum. 2) Updated IParticleClassifier interface.
    • VertexFit.cpp: Fixed nDof calculation in the presence of 'rho+-like' particle thanks to Jascha Grabovski.
  • LoKiFitters
    • Unification of code between LoKi::DistanceCalculator and LoKi::TrgDistanceCalculator. Essentially now the same code is running, modulo property setting (UseFastAlgorithms). In short - LoKi::TrgDistanceCalculator is LoKi::DistanceCalculator with no transportation and no iterations.
    • Implemented more correct treatment of rho+-like particles (thanks to Jascha Grabovski).
    • DistanceCalculator: Use !matrix.InvertChol() && !matrix.Invert() construction instead of !matrix.Invert(). It should be a bit faster.
v21r1 VertexFit: PropertimeFitter: Fixed bug (the tool was returning c.tau (mm) in case of negative covariance and tau (ns) when the fit is OK)
v21r3 LokiFitters: VertexFitter: Change default of transporter to DaVinci::ParticleTransporter:PUBLIC.
v21r4 KalmanFilter, LoKiFitters: Added protection against extrapolations to very large z values.
v21r5 LoKiFitters: Switch on warnings in LoKi::PVReFitter when a PV refit fails with a repetition rate of 1.
v23r3 DaVinciFilters, PhysSelPython: Add new algorithm FilterUnique and extend MergedSelection
CoffeeIntoScience commented 5 years ago

Obviously the log comparison is still in-progress, but the first possibly-suspicious thing I see is we are connecting to different locations to fetch the latest DB tags and ending up with different ones. Mine if fetching

dddb-20130929-1
cond-20141007

while yours finds

dddb-20171030-2
cond-20150409-1

I think it would be worth your time to attempt to run forcing the DB tags that mine thinks are the "latest" to see if anything changes. I should also double check if the momentum scaling map is the same, though it looked okay on the first pass

EDIT: formatting

yipengsun commented 5 years ago

Specifying tag doesn't help at all.

yipengsun commented 5 years ago

Phoebe found some duplicated tracks in our ntuples generated by both versions of DaVinci image

yipengsun commented 5 years ago

This is likely due to the fact that the ms_velo_pions are not set correctly to only pick VELO particles. Examples of correct usage can be found at: https://twiki.cern.ch/twiki/bin/view/LHCb/RestrippingMDST

(Dark Magic warning).

yipengsun commented 5 years ago

But the trackSelector is undefined.

yipengsun commented 5 years ago

I still see duplicated events: dup_tracks

If we scan over the momenta of all 3 tracks, the events are sharing tracks, but no duplicate event is present: dup_tracks

yipengsun commented 5 years ago

The trackSelector and updateDoD are defined at: https://gitlab.cern.ch/lhcb/Stripping/blob/2017-patches/Phys/CommonParticlesArchive/python/CommonParticlesArchive/Stripping33r1/CommonParticles/Utils.py

yipengsun commented 5 years ago

After fixing a comparison bug, I see:

  1. The momentum matching is only performed on events with a single candidate.
  2. All momentum-matched tracks have the same track type (VELO, Forward, Upstream), but not necessarily the same track index (1, 2, 3)
  3. Their BDT scores may differ (see attached csv files). The differences are observed in all track types (VELO, Forward, Upstream)

These csv files are selected by: abs(bdt_score_diff) > 0.1. dv36 dv42

yipengsun commented 5 years ago

@CoffeeIntoScience I've included more Y-branches in the csv files linked above. I still think the values are mostly similar.

CoffeeIntoScience commented 5 years ago

just for future reference, y is more sensitive than x to fitting differences because the uncertainty in that direction is typically greater (less precise downstream measurements).

Hmmm.

would still like to see PT and maybe, I guess, IPCHI2 of the Dst's slow pion

yipengsun commented 5 years ago

I've updated the track type vs chi^2 plots. Indeed there's a significant shift for VELO only tracks.

Once Phoebe uploads her new ntuple to the shared folder, I'll redo plots for ntuples with both refitting and momentum rescaling disabled.

yipengsun commented 5 years ago

We've decided to move up to DaVinci/v45, the recommended version for both run 1 and 2, and see if we still have these disagreements.

The progress will be tracked at umd-lhcb/TupleToolSemileptonic#1.

yipengsun commented 4 years ago

@CoffeeIntoScience I've finally updated my local DaVinci to v45r3, the recommended version for all run 1 and run 2 analyses. I regenerated the no-refit ntuple, and made comparisons between that and v36, v42 ntuples.

The result is: v45 is still different from v36, v45 is identical to v42. Please take a look at: https://umd-lhcb.github.io/lhcb-ntuples-gen/data/comparison_run1_Dst/comparison/#comparison-between-davinci-v36r1p2-and-v45r3-without-refitting

CoffeeIntoScience commented 4 years ago

okaaay? I didn't think we expected anything different in these comparisons?

On Thu, Nov 14, 2019 at 9:53 PM Yipeng Sun notifications@github.com wrote:

@CoffeeIntoScience https://github.com/CoffeeIntoScience I've finally updated my local DaVinci to v45r3, the recommended version for all run 1 and run 2 analyses. I regenerated the no-refit ntuple, and made comparisons between that and v36, v42 ntuples.

The result is: v45 is still different from v36, v45 is identical to v42. Please take a look at:

https://umd-lhcb.github.io/lhcb-ntuples-gen/data/comparison_run1_Dst/comparison/#comparison-between-davinci-v36r1p2-and-v45r3-without-refitting

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/umd-lhcb/lhcb-ntuples-gen/issues/16?email_source=notifications&email_token=ALEZBSCXMPPNA2TOXGJ7EZTQTYFJVA5CNFSM4H5I5FI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEDVFI#issuecomment-554187413, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEZBSEUAYBGRMKINEOQN5TQTYFJVANCNFSM4H5I5FIQ .

yipengsun commented 4 years ago

I thought the idea (your idea) was that given v45 is the officially recommended version for both run 1 and run 2, it may automatically fallback to the previous algorithm.

On Fri, Nov 15, 2019, 12:53 Phoebe H notifications@github.com wrote:

okaaay? I didn't think we expected anything different in these comparisons?

On Thu, Nov 14, 2019 at 9:53 PM Yipeng Sun notifications@github.com wrote:

@CoffeeIntoScience https://github.com/CoffeeIntoScience I've finally updated my local DaVinci to v45r3, the recommended version for all run 1 and run 2 analyses. I regenerated the no-refit ntuple, and made comparisons between that and v36, v42 ntuples.

The result is: v45 is still different from v36, v45 is identical to v42. Please take a look at:

https://umd-lhcb.github.io/lhcb-ntuples-gen/data/comparison_run1_Dst/comparison/#comparison-between-davinci-v36r1p2-and-v45r3-without-refitting

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/umd-lhcb/lhcb-ntuples-gen/issues/16?email_source=notifications&email_token=ALEZBSCXMPPNA2TOXGJ7EZTQTYFJVA5CNFSM4H5I5FI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEDVFI#issuecomment-554187413 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ALEZBSEUAYBGRMKINEOQN5TQTYFJVANCNFSM4H5I5FIQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/umd-lhcb/lhcb-ntuples-gen/issues/16?email_source=notifications&email_token=AIBM3QBRZHDAAXJTBQ6L6Z3QT3N4ZA5CNFSM4H5I5FI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGGB7I#issuecomment-554459389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBM3QEF3475YESHCNBAX43QT3N4ZANCNFSM4H5I5FIQ .

CoffeeIntoScience commented 4 years ago

Hmmm. I guess that was not impossible. I don't know where the switch to the simplified geometry is done, and its very possible its not easily togglable.

Anyway, the bigger point is that lack of better agreement here shouldn't be a road block and you should resume trying to get efficiencies from the inclusive MC samples and close the loop on this factor of 5 as quickly as is reasonable

CoffeeIntoScience commented 4 years ago

Obviously the isolation differences will be an issue going forward -- it might just be that you need to retrain the BDT for Run2 due to this difference, because the change does seem to have affected VELO and everything else differently

yipengsun commented 4 years ago

We've tried to use DaVinci v45r3, and we have the same "problem". Closed for now, as this problem might be resolved by retraining the BDT for run 2.