mcremone / decaf

0 stars 12 forks source link

UL: Processors #81

Open mcremone opened 9 months ago

mcremone commented 9 months ago

We will start from updating the dark Higgs processor to interface with the new data and work with the newer coffea:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py

Simultaneously the script to execute the processors also needs to be adjusted:

https://github.com/mcremone/decaf/blob/UL/analysis/run.py

this is going to be quite some work. Few things to focus on:

1) I'm pretty sure run_uproot_job in coffea 0.7 doesn't exist anymore:

https://github.com/mcremone/decaf/blob/UL/analysis/run.py#L45

We need to learn how we can execute processors locally with python futures.

2) The new UL root files have ParticleNet instead of DeepAK15, therefore this part needs to be changed accordingly:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L449-L451

3) We need to verify that the EE fix for 2017 is still needed for UL:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L352

4) We need to confirm that triggers are still the same as pre-legacy:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L75-L107

in principle I see no reason why they shouldn't.

5) Check if this is the right way to apply PU weights in UL:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L555

also, check if systematic variations are now provided and, if yes, propagate them.

6) Check if muon ID SFs are now function of eta or abseta for all years and, if yes, fix this:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L574-L576

7) Check if this still applies:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L590-L596

8) Check for UL if the eeBadScFilter only applies to data:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L638

Actually check if the met filter recipe is still the same in UL:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L24-L51

9) Move from coffea.hist to hist following these instructions:

https://github.com/CoffeaTeam/coffea/discussions/705

I will include more items if anything comes to mind.

ParticleChef commented 8 months ago

Simultaneously the script to execute the processors also needs to be adjusted:

https://github.com/mcremone/decaf/blob/UL/analysis/run.py

this is going to be quite some work. Few things to focus on:

  1. I'm pretty sure run_uproot_job in coffea 0.7 doesn't exist anymore:

https://github.com/mcremone/decaf/blob/UL/analysis/run.py#L45

We need to learn how we can execute processors locally with python futures.

I can run the run.py file after changing some lines like this:

    output = processor.run_uproot_job(filelist,
                                      treename='Events',
                                      processor_instance=processor_instance,
                                      executor=processor.futures_executor,
                                      executor_args={'nano': True, 'workers': options.workers},
                                      )

from those lines change treename and executor_args

    output = processor.run_uproot_job(filelist,
                                      'Events',
                                      processor_instance=processor_instance,
                                      executor=processor.futures_executor,
                                      executor_args={'schema': NanoAODSchema, 'workers': options.workers},
                                      ) 

https://github.com/ParticleChef/decaf/blob/ULprocessor/analysis/run.py#L46

ParticleChef commented 8 months ago
  1. Check for UL if the eeBadScFilter only applies to data:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L638

Actually check if the met filter recipe is still the same in UL:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L24-L51

eeBadScFilter is recommended to apply MC and Data for three years 18, 17, 16 for UL. https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2

The MET filter recipe for UL is this:

for 2018, 2017
goodVertices
globalSuperTightHalo2016Filter
HBHENoiseFilter
HBHENoiseIsoFilter
EcalDeadCellTriggerPrimitiveFilter
BadPFMuonFilter
BadPFMuonDzFilter
eeBadScFilter
ecalBadCalibFilter
for 2016
goodVertices
globalSuperTightHalo2016Filter
HBHENoiseFilter
HBHENoiseIsoFilter
EcalDeadCellTriggerPrimitiveFilter
BadPFMuonFilter
BadPFMuonDzFilter
eeBadScFilter
ParticleChef commented 8 months ago
  1. We need to verify that the EE fix for 2017 is still needed for UL:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L352

EE noise fix is not needed for UL https://twiki.cern.ch/twiki/bin/view/CMS/JetMET#

ParticleChef commented 8 months ago
  1. Check if muon ID SFs are now function of eta or abseta for all years and, if yes, fix this:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L574-L576

I checked the json file on gitlab from twiki, the abseta is used for muon ID SF for all years. https://twiki.cern.ch/twiki/bin/view/CMS/MuonUL2018n?topic=MuonUL2018 https://gitlab.cern.ch/cms-muonPOG/muonefficiencies/-/blob/master/Run2/UL/2016_preVFP/2016_preVFP_Z/Efficiencies_muon_generalTracks_Z_Run2016_UL_HIPM_ID.json

I checked the input of json.gz file using example code. https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration/-/tree/master/POG/MUO

NUM_LooseID_DEN_TrackerMuons
Correction NUM_LooseID_DEN_TrackerMuons has 4 inputs
   Input year (string): year/scenario: example 2016preVFP, 2017 etc
   Input abseta (real): Probe abseta
   Input pt (real): Probe pt
   Input ValType (string): sf or syst (currently 'sf' is nominal, and 'systup' and 'systdown' are up/down variations with total stat+syst uncertainties. Individual systs are also available (in these cases syst only, not sf +/- syst)

NUM_LooseID_DEN_genTracks
Correction NUM_LooseID_DEN_genTracks has 4 inputs
   Input year (string): year/scenario: example 2016preVFP, 2017 etc
   Input abseta (real): Probe abseta
   Input pt (real): Probe pt
   Input ValType (string): sf or syst (currently 'sf' is nominal, and 'systup' and 'systdown' are up/down variations with total stat+syst uncertainties. Individual systs are also available (in these cases syst only, not sf +/- syst)
mcremone commented 6 months ago

A lot of work has been done to start converting the dark Higgs processor, a lot is still left to be done. Here is a list of items:

1) Isolation for muons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L453-L454

we used to use pfRelIso04_all. Check which is the UL recommendation.

2) ID for electrons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L466-L467

we used to use cutBased. Check which is the UL recommendation.

3) ID for photons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L489-L491

Same as for 2)

4) ParticleNet

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L504-L506

figure out how to use ParticleNet to reproduce this behavior. @ParticleChef should know.

5) Abs(eta) for muon ID

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L624-L626

As commented above by @ParticleChef, this needs to be fixed.

6) Above/below 20 GeV

For electrons, implement something like this for all years:

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L640-L646

using:

get_ele_reco_sf_below20
get_ele_reco_err_below20 get_ele_reco_sf_above20 get_ele_reco_err_above20

7) Prefiring

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L684-L685

Do we still need prefiring weights for UL?

8) HEM veto

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L684-L685

We should check if there is an official UL recipe to deal with the HEM issue.

ParticleChef commented 6 months ago
  1. ParticleNet

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L504-L506

figure out how to use ParticleNet to reproduce this behavior. @ParticleChef should know.

Because the file is updated those lines are about tau. I think the ParticleNet things are this. I used these lines in the previous version:

        fj['probQCDothers'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDothers']
        fj['probQCDb'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDb']
        fj['probQCDbb'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDbb']
        fj['probQCDc'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDc']
        fj['probQCDcc'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDcc']
        fj['probTbcq'] = events.AK15PFPuppi['Jet_particleNetAK15_Tbcq']
        fj['probTbqq'] = events.AK15PFPuppi['Jet_particleNetAK15_Tbqq']
        fj['probQCDothers'] = events.AK15PFPuppi['Jet_particleNetAK15_QCDothers']
        probQCD=fj.probQCDbb+fj.probQCDcc+fj.probQCDb+fj.probQCDc+fj.probQCDothers
        probT=fj.probTbcq+fj.probTbqq
        fj['TvsQCD'] = probT/(probT+probQCD)
ParticleChef commented 6 months ago

A lot of work has been done to start converting the dark Higgs processor, a lot is still left to be done. Here is a list of items:

  1. Isolation for muons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L453-L454

we used to use pfRelIso04_all. Check which is the UL recommendation.

  1. ID for electrons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L466-L467

we used to use cutBased. Check which is the UL recommendation.

  1. ID for photons

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L489-L491

Same as for 2)

  1. muon isolationi We use pfIsoId for monotop analysis.

  2. electron id The cut-based electron ID Fall17v2 (cutBased) is used for electron id for monotop analysis Electron_cutBased Int_t cut-based ID Fall17 V2 (0:fail, 1:veto, 2:loose, 3:medium, 4:tight)

  3. photon id Photon id also use cut-based photon ID Fall17v2 (cutBased). Photon_cutBased Int_t cut-based ID bitmap, Fall17V2, (0:fail, 1:loose, 2:medium, 3:tight)

mcremone commented 6 months ago

@ParticleChef Can I see how you used pfIsoId? I think that's not the isolation variable I was looking for.

ParticleChef commented 6 months ago

I use the muon definition like this:

def isLooseMuon(pt,eta,iso,loose_id,year, is_pfcand, is_global, is_tracker)
    mask = (pt>15)&(abs(eta)<2.4)&(loose_id)&(iso>=2)#&(is_pfcand)&(is_global)&(is_tracker)
    return mask

mu['isloose'] = isLooseMuon(mu.pt, mu.eta, mu.pfIsoId, mu.looseId, self._year, mu.isPFcand, mu.isGlobal, mu.isTracker)

In nanoaod document, pfIsoId is

Muon_pfIsoId    UChar_t PFIso ID from miniAOD selector (1=PFIsoVeryLoose, 2=PFIsoLoose, 3=PFIsoMedium, 4=PFIsoTight, 5=PFIsoVeryTight, 6=PFIsoVeryVeryTight)
mcremone commented 6 months ago

I use the muon definition like this:

def isLooseMuon(pt,eta,iso,loose_id,year, is_pfcand, is_global, is_tracker)
    mask = (pt>15)&(abs(eta)<2.4)&(loose_id)&(iso>=2)#&(is_pfcand)&(is_global)&(is_tracker)
    return mask

mu['isloose'] = isLooseMuon(mu.pt, mu.eta, mu.pfIsoId, mu.looseId, self._year, mu.isPFcand, mu.isGlobal, mu.isTracker)

In nanoaod document, pfIsoId is

Muon_pfIsoId  UChar_t PFIso ID from miniAOD selector (1=PFIsoVeryLoose, 2=PFIsoLoose, 3=PFIsoMedium, 4=PFIsoTight, 5=PFIsoVeryTight, 6=PFIsoVeryVeryTight)

Ok, I believe it would be good to double check this against official POG UL recommendations on twikies. Actually, it would be great to link twikies with recommendations in ids.py. I'll reopen the github issue on IDs and link to this comment.

mcremone commented 6 months ago

Ah wait, I forgot you already put links in ids.py. My bad.

michaeldmurphy1 commented 6 months ago
  1. HEM veto

https://github.com/mcremone/decaf/blob/UL/analysis/processors/darkhiggs.py#L684-L685

We should check if there is an official UL recipe to deal with the HEM issue.

The recommendation for UL is the same for previous version (link, under the chart in "Run2 recommendations")

mcremone commented 6 months ago

I use the muon definition like this:

def isLooseMuon(pt,eta,iso,loose_id,year, is_pfcand, is_global, is_tracker)
    mask = (pt>15)&(abs(eta)<2.4)&(loose_id)&(iso>=2)#&(is_pfcand)&(is_global)&(is_tracker)
    return mask

mu['isloose'] = isLooseMuon(mu.pt, mu.eta, mu.pfIsoId, mu.looseId, self._year, mu.isPFcand, mu.isGlobal, mu.isTracker)

In nanoaod document, pfIsoId is

Muon_pfIsoId  UChar_t PFIso ID from miniAOD selector (1=PFIsoVeryLoose, 2=PFIsoLoose, 3=PFIsoMedium, 4=PFIsoTight, 5=PFIsoVeryTight, 6=PFIsoVeryVeryTight)

As an aside, you may want to remove the requirements on mu.isPFcand, mu.isGlobal, mu.isTracker. Those should be already included in mu.looseId.