Port Greg/Lucia's run 2 UBDT efficiency evaluator

yipengsun commented 2 years ago

As we are waiting for the official PIDCalib ntuple production to finish, we can first port the UBDT efficiency evaluator so that it works on the small sample PIDCalib ntuple.

Efficiency evaluation procedure

There's my current understanding on the UBDT efficiency evaluation.

We use sPlot for this. sPlot provides a method to unfold the overall distribution of a mixed sample of events into the sub-distributions of the various species.

For more info, read this paper
I'll use equations in that paper, marked with Eq.N

The procedures are the following:

[ ] Perform an unbinned ML fit, assuming certain distributions for sig and bkg for the B -> J/psi K sample
[ ] Compute the sWeight w/ Eq.14 for each event
- Note that we should have a 2x2 covariance matrix
- sWeight is just covariance-weighted weight
[ ] If we can extract the sWeights and sig. yield from the official sample, we don't even need to do the first two steps
- This is highly desirable
- We should ask Vitalli about this
- Also note that sWeight doesn't depend on a binning scheme
[ ] Find the PID efficiency in terms of, say PT distrbution
- This is very similar to Eq.15
- Eq.15 is just sum of sWeights in a particular bin, normalized by the total yield
- Here we need to use the sum of of sWeight*PID efficiency for each event, then normalize the same way
- The procedure can be trivially generalized to be multi-dimensional
[ ] Question: How'd we handle the uncertainty?

Does an efficiency evaluator exist already?

We have discussed about this w/ Greg and Lucia ~1 year ago, and Lucia sent us her code:

/afs/cern.ch/work/l/lgrillo/public/PIDCalibTuples  # This is for Castelao, so unrelated
/afs/cern.ch/work/l/lgrillo/public/forMuonID  # This is mostly Greg's run 2 UBDT applier, not the efficiency evaluator
/afs/cern.ch/work/l/lgrillo/public/forMuonID_Run2Update  # Same as above

After looking at these folders, we agreed that the efficiency evaluator is not shared.

yipengsun commented 2 years ago

Here's a slide from Lucia regarding their progress on updating UBDT run 2 efficiency: https://indico.cern.ch/event/824062/contributions/3446283/attachments/1853034/3042705/SLMuonID_v1.pdf

yipengsun commented 2 years ago

@manuelfs @Svende @afernez The third folder is mostly identical to the second. I also checked the slides, and I don't think the plots there used sPlot technique.

So my conclusion is that we don't have their efficiency evaluation code.

manuelfs commented 2 years ago

I quickly checked, and I think you are right: I don't see the efficiency evaluation code

yipengsun commented 2 years ago

@manuelfs @Svende @afernez I've updated the top post to include my current understanding on the whole efficiency evaluation procedure. Please take a look (and also the sPlot paper, if you have time!).

yipengsun commented 2 years ago

Given that there's a branch called probe_sWeight in the official PIDCalib sample, and we'll use PIDCalib sample directly for Mu UBDT study, the sweight part is not needed.

We'll not bother Greg/Lucia anymore on this.

umd-lhcb / MuonBDTPid

Port Greg/Lucia's run 2 UBDT efficiency evaluator #8

Efficiency evaluation procedure

Does an efficiency evaluator exist already?