mcremone / decaf

0 stars 12 forks source link

UL: IDs #73

Open mcremone opened 9 months ago

mcremone commented 9 months ago

Need to have a UL version of this:

https://github.com/mcremone/decaf/blob/master/analysis/utils/ids.py

thresholds need to be adjusted to satisfy the UL requirements. It requires a lot of boring searches into POG twikies. Please add the reference to the twiki as a comment.

alejands commented 9 months ago

Where are the minimum pt cuts coming from? For electrons, I can't seem to find a minimum pt requirement in any of the twikis from the EGamma POG.

Electrons don't have different ID thresholds for UL (see CutBasedElectronIdentificationRun2), but there's no mention of pt.

mcremone commented 9 months ago

whenever you don't find it on official twikies, pT cuts come from other analyses. I'd say, as a general rule, unless something different is explicitly required on official twikies for UL, we keep what we have.

alejands commented 9 months ago

https://github.com/mcremone/decaf/blob/8e989f5b9b4d1f6e0f9f4077cabe28452c2ba6ca/analysis/utils/ids.py#L62-L71

The tau bitmask values documented in UL NanoAODs are different from what's in our documentation.

ide:

byDeepTau2017v2p1VSe ID working points (deepTau2017v2p1): bitmask 1 = VVVLoose, 2 = VVLoose, 4 = VLoose, 8 = Loose, 16 = Medium, 32 = Tight, 64 = VTight, 128 = VVTight

idj:

byDeepTau2017v2p1VSjet ID working points (deepTau2017v2p1): bitmask 1 = VVVLoose, 2 = VVLoose, 4 = VLoose, 8 = Loose, 16 = Medium, 32 = Tight, 64 = VTight, 128 = VVTight

idmu:

byDeepTau2017v2p1VSmu ID working points (deepTau2017v2p1): bitmask 1 = VLoose, 2 = Loose, 4 = Medium, 8 = Tight

Based of this, we have ide = Medium, idj = VLoose, and idmu = Loose. Is this what we want, or should we have all 3 IDs set to Loose?

ParticleChef commented 9 months ago

I updated the ids.py file for UL monotop analysis. https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py I included twiki links on it. I follow the monotop analysis definitions referring to pog recommend on twiki.

alejands commented 9 months ago

Thanks @ParticleChef! It seems you had already taken care of this!

https://github.com/ParticleChef/decaf/blob/c542b9f0eaa9ee740aed2cc143ef71f962c83281/analysis/utils/ids.py#L71-L76

@mcremone Do we need updated Tau IDs for any of other the other analyses?

mcremone commented 9 months ago

@alejands the tau ID you find is targeting a 90% efficiency. Working points for different IDs that give a 90% efficiency are taken from here:

https://arxiv.org/pdf/2201.08458.pdf#page=13

based on this the following working points were selected:

((ide&16)==16)&((idj&4)==4)&((idm&2)==2)

I don't think this changed between pre-legacy and UL.

mcremone commented 9 months ago

@ParticleChef I have a few comments:

1) You may want to verify that this is still ok:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L103-L114

More specifically, the different pT thresholds per year, especially because the comment "2017/18 pT requirement adjusted to match monojet, using dedicated ID SFs" is not relevant anymore with UL. You are now using UL corrections from the EGM POG.

2) You may want to verify that the loose ID is properly applied here:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L96

the fact that the requirement is applied differently in 2016 with respect to 2017/18. You may want to check this still applies in UL.

3) Check if we still need a dedicated HEM jet definition for an HEM veto in UL. If not, remove this part:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L140-L146

alejands commented 9 months ago
  1. You may want to verify that the loose ID is properly applied here:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L96

the fact that the requirement is applied differently in 2016 with respect to 2017/18. You may want to check this still applies in UL.

Photon ID now uses the Fall17V2 bitmap just like the electron ID, but the thresholds are the same. See my version of the documentation:

https://github.com/alejands/decaf/blob/f88a5a1b20b410095563a01202ed41909e582163/analysis/utils/ids.py#L106-L116

# The electron ID and photon IDs remain to be the Fall17V2 IDs for ULs
# https://twiki.cern.ch/twiki/bin/view/CMS/EgammaUL2016To2018

# 94X-V2 ID is the recommended ID for all 3 years of Run 2, ie, 2016 legacy
# rereco, 2017 rereco & UL and 2018 rereco. This ID was tuned using 2017 94X
# samples.
# https://twiki.cern.ch/twiki/bin/view/CMS/CutBasedPhotonIdentificationRun2

# Photon_cutBased
# cut-based ID bitmap, Fall17V2, (0:fail, 1:loose, 2:medium, 3:tight)
# https://twiki.cern.ch/twiki/bin/view/CMS/EgammaNanoAOD#NanoAOD_Variables

The descriptions are copied from the NanoAODv9 self documentation (which are the same for all 3 years): https://cms-nanoaod-integration.web.cern.ch/autoDoc/

I had originally changed all the photon IDs to loose_id>=1 to match the convention used for electrons, but (loose_id&1)==1 should be functionally equivalent.

alejands commented 9 months ago

I had originally changed all the photon IDs to loose_id>=1 to match the convention used for electrons, but (loose_id&1)==1 should be functionally equivalent.

Scratch that. They're not equivalent and medium ID would return false.

The photon ID should be set to loose_id>=1 for all years.

mcremone commented 9 months ago

@alejands I actually think that is equivalent. It is good that medium ID return false, because it never happens that a medium photon is exclusively medium. If it's medium, it's also loose. Therefore the ID will be constructed in a way such that, if the photon is medium, (loose_id&1)==1 will return true. I personally prefer using conditions like (loose_id&1)==1 instead of loose_id>=1

alejands commented 9 months ago

There are two definitions for the ID in the NanoAOD, so if we go with the bitmap we have to be careful about our choice and the bitmap definition only applies to Fall17V1, which is deprecated

Photon_cutBased.                Int_t  cut-based ID bitmap, Fall17V2, (0:fail, 1:loose, 2:medium, 3:tight)
Photon_cutBased_Fall17V1Bitmap  Int_t  cut-based ID bitmap, Fall17V1, 2^(0:loose, 1:medium, 2:tight).

Using loose_id>=1 would work for both. We can still use (loose_id&1)==1 as long as we're sure to use the bitmap variable. In this case, loose_id>=1 should be used.

Side note: There isn't a similar Electron_cutBased_Fall17V1Bitmap variable, only Electron_cutBased

Electron_cutBased   Int_t   cut-based ID Fall17 V2 (0:fail, 1:veto, 2:loose, 3:medium, 4:tight)
alejands commented 9 months ago

I've attached a sample from one of the DY PFNano outputs. This is Run3 data, but the NanoAOD variable description is similar, just with a different era used for calibration. (I will note that the data type in Run3 is UChar_t instead of Int_t). In here, the bin corresponding to 2 is still filled.

Photon_cutBased UChar_t cut-based ID bitmap, RunIIIWinter22V1, (0:fail, 1:loose, 2:medium, 3:tight)

Given all various versions of formatting this variable, I suggest we use loose_id>=1 to play it safe.

Screenshot

Screenshot at Dec 11 15-30-07

mcremone commented 9 months ago

I really don't understand how this "bitmap" is constructed. My guess is that for electrons that satisfy all loose, medium, and tight IDs they assign a value of 3. For electron that satisfy only loose and medium ID they assign a value of 2. Finally for electrons that satisfy only loose ID they assign a value of 1.

Before moving forward can we verify that my interpretation is correct? If that is correct, indeed we must use loose_id>=1

We should verify this also for all other objects, including jet ID and PU ID.

alejands commented 9 months ago

I fetched a random file from 2017 UL (I just looked up the dataset they used as an example in the NanoAOD self-documentation), and indeed these EGamma "bitmaps" are integer flags, not bit flags. The only exception is the deprecated bitmap which is documented accordingly.

Jet_jetId and FatJet_jetId are bitmaps as expected. Jet_puId also seems to be a bitmap.

The Muon IDs are type bool. Currently we treat tight_id as a bool, but we use loose_id>0 in our code.

Tau IDs are bitmaps as documented and match their description.

See screenshots

## EGamma Screenshot at Dec 11 17-12-19 Screenshot at Dec 11 17-12-46 Screenshot at Dec 11 17-13-00 ## Jets Screenshot at Dec 11 17-22-53 Screenshot at Dec 11 17-22-26 Screenshot at Dec 11 17-33-41 ## Tau Screenshot at Dec 11 17-34-05

I didn't screenshot the Muon bool IDs or the other two Tau IDs, but they're what you expect.

alejands commented 9 months ago

After asking for permission, I updated the formatting of ids.py in an attempt to improve legibility. https://github.com/alejands/decaf/blob/UL_ids/analysis/utils/ids.py

The changes I pushed https://github.com/alejands/decaf/commit/0d76f77f9f642663346a046fe573736672175fb5 (edit: updated commit) were made on top of the latest commit from @ParticleChef's UL branch. The commit description is copied below.

Update ids.py formatting style

Done in an attempt to improve the readability of the IDs. This commit does not include any changes to thresholds or bitmaps, nor does it add any documentation.

Changes still need to be made for the updated electron and photon ID definitions, as discussed in Issue https://github.com/mcremone/decaf/issues/73. Documentation for Tau ID bitmaps should also be added.


Edit: Missing ")" found in line 280. Previous commit was squashed and force pushed. The link to the commit in this comment above has been updated.

alejands commented 9 months ago

https://github.com/alejands/decaf/blob/0d76f77f9f642663346a046fe573736672175fb5/analysis/utils/ids.py#L266

The twiki link for Jet Pileup ID has changed for UL to https://twiki.cern.ch/twiki/bin/view/CMS/PileupJetIDUL and there is a change in the Jet Pileup ID definitions.

In brief, a bug in 2016 UL makes it so the LooseID and TightID bit flags are flipped. So for 2016 we use the condition (pu_id&1)==1, and for 2017/2018 we use the condition (pu_id&4)==4.

Show Jet_puId bit flag details

There is a note at the bottom of the section on 2016 data. > NOTE (Specifically for 2016 UL): As you notice in the instructions above, the Tight and Loose bit flags are flipped with each other (when compared to 2017 UL and 2018 UL). This is due to an accidental switch of the working point cut values defined in the [PileupJetIDCutParams_cfi.py file](https://github.com/cms-sw/cmssw/blob/CMSSW_10_6_26/RecoJets/JetProducers/python/PileupJetIDCutParams_cfi.py#L82-L101). All 2016 (APV and non-APV) UL NanoAODv9 samples are affected. According to this twiki for 2016UL, > The flag represents passtightID\*4+passmediumID\*2+ passlooseID\*1, so that: > > puId==0 means 000: fail all PU ID; > **puId==1 means 001: pass loose ID, fail medium, fail tight;** > **puId==3 means 011: pass loose and medium ID, fail tight;** > puId==7 means 111: pass loose, medium, tight ID. and for 2017UL/2018UL, > The flag represents passlooseID\*4+passmediumID\*2+passtightID\*1, so that: > > puId==0 means 000: fail all PU ID; > **puId==4 means 100: pass loose ID, fail medium, fail tight;** > **puId==6 means 110: pass loose and medium ID, fail tight;** > puId==7 means 111: pass loose, medium, tight ID.

I checked two files from the same primary dataset for UL16 and UL17 and indeed the middle two flags are different (1/3/7 vs. 4/6/7 for Loose/Medium/Tight, respectively).

Show example datasets used and Jet_puId plots

## Datasets ``` /TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM ``` vs. ``` /TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v1/NANOAODSIM ``` ## UL16 UL16 ## UL17 UL17

I corrected the flags and tried to condense the info as much as I could in commit https://github.com/alejands/decaf/commit/2bc37461f53c5694c7684eb5c38a3ee430aee3ae.


I still need to finish updating the Photon IDs and documentation.

alejands commented 9 months ago

@ParticleChef when you get a chance, can you still look into items 1 and 3 in https://github.com/mcremone/decaf/issues/73#issuecomment-1849538048? You have more knowledge of the details of the analysis than I do.

mcremone commented 9 months ago

Sounds good. Is there anything missing? If not, would you mind testing the script and if it works, making a PR?

ParticleChef commented 9 months ago

Hi I reply each comment.

  1. You may want to verify that this is still ok:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L103-L114

More specifically, the different pT thresholds per year, especially because the comment "2017/18 pT requirement adjusted to match monojet, using dedicated ID SFs" is not relevant anymore with UL. You are now using UL corrections from the EGM POG.

I updated the pT requirement for monotop analysis. I will fix the comment lines.

  1. You may want to verify that the loose ID is properly applied here:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L96

the fact that the requirement is applied differently in 2016 with respect to 2017/18. You may want to check this still applies in UL.

I think it is okay but I will check again for 2016 ID.

  1. Check if we still need a dedicated HEM jet definition for an HEM veto in UL. If not, remove this part:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L140-L146 In UL, the HEM veto also needed and this cut is applied. Currently I'm looking the pT cut and it is 15 GeV instead of 30 GeV from KIT study of monotop. It will be updated.

Thank you for comments and checking the IDs. I checked the twiki and current method in ids.py file works at least 2018. I will look the method for IDs again by years (The difference between 16 and 17/18?)

alejands commented 9 months ago

Sounds good. Is there anything missing? If not, would you mind testing the script and if it works, making a PR?

I finished updating the remaining IDs and documentation. I was able to run*

python utils/ids.py

and successfully created the file data/test_ids.py. I pulled the previous work from @ParticleChef in PR

and added my commits in


*after going version-by-version with pip (yes I did run the setup and env scripts) trying to find a coffea installation that would work...

The one that finally worked was

pip install --user coffea==0.7.20
mcremone commented 9 months ago

Hi I reply each comment.

  1. You may want to verify that this is still ok:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L103-L114 More specifically, the different pT thresholds per year, especially because the comment "2017/18 pT requirement adjusted to match monojet, using dedicated ID SFs" is not relevant anymore with UL. You are now using UL corrections from the EGM POG.

I updated the pT requirement for monotop analysis. I will fix the comment lines.

  1. You may want to verify that the loose ID is properly applied here:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L96 the fact that the requirement is applied differently in 2016 with respect to 2017/18. You may want to check this still applies in UL.

I think it is okay but I will check again for 2016 ID.

  1. Check if we still need a dedicated HEM jet definition for an HEM veto in UL. If not, remove this part:

https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/ids.py#L140-L146 In UL, the HEM veto also needed and this cut is applied. Currently I'm looking the pT cut and it is 15 GeV instead of 30 GeV from KIT study of monotop. It will be updated.

Thank you for comments and checking the IDs. I checked the twiki and current method in ids.py file works at least 2018. I will look the method for IDs again by years (The difference between 16 and 17/18?)

@ParticleChef I think @alejands already fixed all the thresholds and the applications of the IDs. Would you mind double-checking with him?

mcremone commented 9 months ago

@alejands Would you mind fixing this line:

https://github.com/mcremone/decaf/blob/UL/setup_lcg.sh#L5

with the right coffea version?

alejands commented 9 months ago

@ParticleChef I think @alejands already fixed most of this. Would you mind double-checking with him?

@mcremone I did not look at the pt thresholds or HEM veto, and I left those comments untouched, mainly because I don't see them in the POG twikis, and from what I gather from earlier comments, they're analysis specific.

alejands commented 9 months ago

I also want to note that this the Jet veto function now takes in a year, whereas before it didn't take in a year argument.

https://github.com/mcremone/decaf/blob/5e0ded50c60a4ed5197417f57e3352311138ef78/analysis/utils/ids.py#L297-L309

I would imagine that some scripts down the line would need to be updated. @mcremone @ParticleChef Do either of you happen to know where this is applied? If necessary, we could open a separate issue for this point.

mcremone commented 9 months ago

@ParticleChef I think @alejands already fixed most of this. Would you mind double-checking with him?

@mcremone I did not look at the pt thresholds or HEM veto, and I left those comments untouched, mainly because I don't see them in the POG twikis, and from what I gather from earlier comments, they're analysis specific.

Sure, let @ParticleChef fix this, since she has access to the current monitor analysis code from KIT where this is implemented and that we can use as reference.

mcremone commented 9 months ago

I also want to note that this the Jet veto function now takes in a year, whereas before it didn't take in a year argument.

https://github.com/mcremone/decaf/blob/5e0ded50c60a4ed5197417f57e3352311138ef78/analysis/utils/ids.py#L297-L309

I would imagine that some scripts down the line would need to be updated. @mcremone @ParticleChef Do either of you happen to know where this is applied? If necessary, we could open a separate issue for this point.

the one you are referring to here is not the HEM jet veto function, but rather the definition of a "good" jet. I understand it now takes the year as an argument because the PU ID is applied differently in different years. Can you confirm this?

To answer your question, these functions are used in the processor and it will be an easy fix.

alejands commented 9 months ago

the one you are referring to here is not the HEM jet veto function, but rather the definition of a "good" jet. I understand it now takes the year as an argument because the PU ID is applied differently in different years. Can you confirm this?

That is correct

mcremone commented 9 months ago

the one you are referring to here is not the HEM jet veto function, but rather the definition of a "good" jet. I understand it now takes the year as an argument because the PU ID is applied differently in different years. Can you confirm this?

That is correct

I'm kind of surprised that that's the case with UL, since they should have uniformed recipes for different years, but that's life.

alejands commented 9 months ago

I'm kind of surprised that that's the case with UL, since they should have uniformed recipes for different years, but that's life.

It's due to a bug causing the Loose and Tight bit flags to be flipped in 2016 UL. See https://github.com/mcremone/decaf/issues/73#issuecomment-1853136589 for more details, and click on the drop downs for even more information.

alejands commented 9 months ago

@alejands Would you mind fixing this line:

https://github.com/mcremone/decaf/blob/UL/setup_lcg.sh#L5

with the right coffea version?

Updated in

alejands commented 9 months ago

Copying from https://github.com/mcremone/decaf/pull/76#issuecomment-1865406099:

I found some bugs in this PR and and pushed fixes in f0b748e and b0d3c7e. See the full commit descriptions for details.

alejands commented 8 months ago

@mcremone PR #76 should also be good to go

alejands commented 8 months ago

Task completed with PR #76 merged

mcremone commented 6 months ago

@ParticleChef Would you mind checking the current version of ids.py?

https://github.com/mcremone/decaf/blob/UL/analysis/utils/ids.py

Already looking at the muons, our definition doesn't match what's in the twiki. Also, I have some difficulties finding in the twikies recommendations for pT thresholds. Would you mind having a look?

P.S.: you may have noticed I changed the structure of the script a bit. This is just to have everything in one place. For example, we are not passing features of object, that you define separately in the processor, in the functions anymore. We pass the object itself. Since these are shared recipes, by passing the object directly, we can retrieve the recommende features and use them in the functions all in the same place.

mcremone commented 6 months ago

@ParticleChef Ping on this.