Open mcremone opened 1 year ago
An additional comment, we need to double check one by one against the corrections implemented by KIT. Still, we want to implement ours using correctionlib.
Electron id, photon id, MET phi corrections, and pu weight are included at correction.py file with json files (https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration) The electron trigger weight and reco sf should be included.
For nlo ewk scale factor, the root files are made by monojet analysis. At the last meeting, we discussed that this scale factor can be used. https://github.com/ParticleChef/decaf/blob/master/analysis/utils/corrections.py#L220
Electron id, photon id, MET phi corrections, and pu weight are included at correction.py file with json files (https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration) The electron trigger weight and reco sf should be included.
Can you point me to the part of your code where these are used? Also, how about muon isolation weights? For what concerns trigger weight, most likely you're also missing single muon and MET trigger, am I right?
Yes I'm not uploaded muon trigger, isolation weight yet. And I pointed the part of current codes. Electron ID sf : https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L32 Photon ID sf: https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L32 pu weight: https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L32 MET correction: https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L32
And I included the MET trigger and nlo sf from previous corrections.py file. nlo sf: https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L154 met trigger: https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L15
The btag weight part is modifying now.
I make quick test with json file and btageff.merged file existed.
# generate 20 dummy jet features
jet_pt = np.random.exponential(50., 15)
jet_eta = np.random.uniform(0.0, 2.4, 15)
jet_flav = np.random.choice([0, 4, 5], 15)
jet_discr = np.random.uniform(0.0, 1.0, 15)
# separate light and b/c jets
light_jets = np.where(jet_flav == 0)
bc_jets = np.where(jet_flav != 0)
btag = load('hists/btageff2017.merged')
bpass = btag[tagger].integrate('dataset').integrate('wp',workingpoint).integrate('btag', 'pass').values()[()]
ball = btag[tagger].integrate('dataset').integrate('wp',workingpoint).integrate('btag').values()[()]
nom = bpass / np.maximum(ball, 1.)
eff = lookup_tools.dense_lookup.dense_lookup(nom, [ax.edges() for ax in btag[tagger].axes()[3:]])
btvjson = correctionlib.CorrectionSet.from_file('data/BtagSF/'+year+'_UL/btagging.json.gz')
sf_nom = btvjson["deepJet_comb"].evaluate('central','M', jet_flav[bc_jets], jet_eta[bc_jets], jet_pt[bc_jets])
print('sf_nom: ', sf_nom, len(sf_nom))
def P(eff):
weight = eff.ones_like()
weight[istag] = eff[istag]
weight[~istag] = (1 - eff[~istag])
return weight.prod()
eff = eff(jet_pt, jet_eta, jet_flav)
print('extract eff:', eff, len(eff))
eff_data_nom = np.minimum(1., sf_nom*eff)
nnom = P(eff_data_nom)/P(eff)
print('P(eff_data_nom)/P(eff)', nnom)
I printed the values and I got error like this
sf_nom: [0.94694163 0.95233112 0.9551299 0.95522698 0.95875001 0.94341749
0.95456105 0.94572292 0.94499175 0.95435803 0.94332464] 11
extract eff: [0.9375 0.9375 0.9375 0.9375 0.9375 0.61748634
0.9375 0.9375 0.91052632 0.91052632 0.91052632 0.91052632
0.9375 0.9375 0.61748634] 15
Traceback (most recent call last):
File "utils/cortest.py", line 89, in <module>
eff_data_nom = np.minimum(1., sf_nom*eff)
ValueError: operands could not be broadcast together with shapes (11,) (15,)
Which part should I fix to solve this error?
To avoid this shape mismatch you can use real data/MC in the test. My suggestion is that we finish first implementing all corrections with correctionlib (when possible). I'll then do a quick review of the code and then we structure a test.
I finished the modifying btag. How you implement the jec? Other than jec, I modified all corrections I need.
For jet you can follow this:
https://github.com/nsmith-/boostedhiggs/blob/master/boostedhiggs/build_jec.py
I update the corrections.py and jet energy correction files.
When run the correction.py, the error is accured at import uproot_methods
Traceback (most recent call last):
File "utils/corrections.py", line 6, in <module>
import uproot, uproot_methods
File "/uscms/home/jhong/.local/lib/python3.6/site-packages/uproot_methods/__init__.py", line 5, in <module>
from uproot_methods.classes.TVector2 import TVector2, TVector2Array
File "/uscms/home/jhong/.local/lib/python3.6/site-packages/uproot_methods/classes/TVector2.py", line 8, in <module>
import awkward.array.jagged
ModuleNotFoundError: No module named 'awkward.array'
Instead of this, update error lines to uproot3.
And separate the '2016' to '2016preVFP' and '2016postVFP' at btag part. (https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L465) (https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/common.py#L19)
Hi think you want to do the other way around, which means using the latest awkward version, changing the code lines to use what the latest awkward version wants you to use.
Then, is there no need to change the version of Awkward?
In general you want to use the latest version of everything, both awkward and uproot. If we need to change the code a bit to adjust to the format the new versions may want, that's what I would do.
The current correction.py works in awkward version 1.9.0. I checked my current setup and latest version of some module.
(current / latest)
awkward ( 1.9.0 / 2.4.10 )
uproot ( 4.3.7 / 5.1.2 )
uproot3 ( 3.14.4 / 3.14.4 )
numpy ( 1.17.0 / 1.26.0 )
Is it okay to change the version in current coffea (0.7.12)?
where the current version automatically installed when you installed coffea 0.7.12?
I forgot which version is installed when I installed coffea 0.7.12. All module are installed at /uscms/home/jhong/.local/lib/python3.6/site-packages/
I think that if you didn't upgrade packages by hand, those are the versions that coffea installed by itself. I wouldn't touch them then, but in the correction.py code, wherever you are using using uproot3, use uproot instead. If this makes the code crash, then we need to understand why.
@ParticleChef were you able to use uproot instead of uproot3? Besides that I don't think that this needs more work.
Actually, we need also to implement the UL ttbar corrections:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L352-L353
To be found here:
@ParticleChef were you able to use uproot instead of uproot3? Besides that I don't think that this needs more work.
@ParticleChef any news on this?
Hi. I I checked the uproot, uproot3 and uproot_methods. The firstly error is accurred in uproot_methods. The error is like this:
[jhong@cmslpc175 analysis]$ python utils/corrections.py
Traceback (most recent call last):
File "utils/corrections.py", line 4, in <module>
import uproot, uproot_methods
File "/uscms/home/jhong/.local/lib/python3.6/site-packages/uproot_methods/__init__.py", line 5, in <module>
from uproot_methods.classes.TVector2 import TVector2, TVector2Array
File "/uscms/home/jhong/.local/lib/python3.6/site-packages/uproot_methods/classes/TVector2.py", line 8, in <module>
import awkward.array.jagged
ModuleNotFoundError: No module named 'awkward.array'
And using uproot without uproot_methods, the lookup_tools has error:
Traceback (most recent call last):
File "utils/corrections.py", line 28, in <module>
get_met_trig_weight[year] = lookup_tools.dense_lookup.dense_lookup(met_trig_hist.values, met_trig_hist.edges)
AttributeError: 'Model_TH1F_v1' object has no attribute 'edges'
Everything works well when I change all uproot to uproot3 (I don't include uproot_methods)
In all lines using uproot.open
, uproot changed to uproot3. One of the line is https://github.com/ParticleChef/decaf/blob/UL/analysis/utils/corrections.py#L20
Actually, we need also to implement the UL ttbar corrections:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L352-L353
To be found here:
@alejands can you look into this?
I forgot to mention that we also need to updated these corrections:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L311-L320
A good place to start would be asking the boosted Higgs team, or digging into their code:
https://github.com/nsmith-/boostedhiggs/tree/master/boostedhiggs
@alejands @ParticleChef
I modified the btageff.py file for making btageff merged file used in corrections.py file. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/processors/btageff.py
Does it need any other process other than reduce.py and merge.py to make btageff.merged files?
@ParticleChef I really have a strong preference for adding boolean as attributes of objects. For example, here:
https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/processors/btageff.py#L52
I really prefer this:
https://github.com/mcremone/decaf/blob/master/analysis/processors/btageff.py#L49
@ParticleChef I really have a strong preference for adding boolean as attributes of objects. For example, here:
https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/processors/btageff.py#L52
I really prefer this:
https://github.com/mcremone/decaf/blob/master/analysis/processors/btageff.py#L49
@ParticleChef can you open a separate issue for this? Also in this case we should move from coffea.hist
to hist
following these instructions:
Actually, we need also to implement the UL ttbar corrections:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L352-L353
To be found here:
After going through the twiki above and going around some TOP PAG twikis to double check, it appears that no updates have been made to the top pt reweighting function for data-NLO (data/POWHEG+Pythia8). The recommendation still matches our code. https://github.com/mcremone/decaf/blob/ed33cc149341bac97de11088b497202d07f7372b/analysis/utils/corrections.py#L308-L309
I did notice this line in the twiki...
New plots with full Run 2 data and different predictions are expected to replace these soon (08/2020).
I was able to update corrections.py
script to use the uproot
package rather than uproot3
. I'll be adding my updates in this PR:
I noticed these output filenames were changed by @ParticleChef, presumably while testing: https://github.com/mcremone/decaf/blob/ed33cc149341bac97de11088b497202d07f7372b/analysis/utils/ids.py#L347 https://github.com/mcremone/decaf/blob/ed33cc149341bac97de11088b497202d07f7372b/analysis/utils/corrections.py#L631
Should these be changed back or left as is?
Output filenames above updated in commit 87ddf88.
I forgot to mention that we also need to updated these corrections:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L311-L320
A good place to start would be asking the boosted Higgs team, or digging into their code:
https://github.com/nsmith-/boostedhiggs/tree/master/boostedhiggs
@alejands @ParticleChef
Here is the way to implement the new corrections:
https://github.com/jennetd/hbb-coffea/blob/master/boostedhiggs/corrections.py#L25-L47
@alejands you can take msdcorr.json
from here:
https://github.com/jennetd/hbb-coffea/blob/master/boostedhiggs/data/msdcorr.json
The PR has been updated with the new msd corrections (commit 7471585).
The new get_msd_corr()
function takes in fatjet coffea objects rather than pt and eta awkward arrays. The scripts in analysis/processors
that call this function are updated accordingly, but have not been tested since the compatibility for these scripts has not been updated.
I had a look and this still needs work.
1) NLO corrections.
I noticed that only EWK corrections are implemented:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L290-L305
@ParticleChef can you confirm that this is because samples are already NLO in QCD?
Also, systematic variations need to be implemented. They can be taken from here:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L248-L350
2) JERC
This won't work unfortunately:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L602-L621
We need to implement this:
https://github.com/nsmith-/boostedhiggs/blob/master/boostedhiggs/build_jec.py
I'll open a new issue for this.
I had a look and this still needs work.
- NLO corrections.
I noticed that only EWK corrections are implemented:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L290-L305
@ParticleChef can you confirm that this is because samples are already NLO in QCD?
Also, systematic variations need to be implemented. They can be taken from here:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L248-L350
- JERC
This won't work unfortunately:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L602-L621
We need to implement this:
https://github.com/nsmith-/boostedhiggs/blob/master/boostedhiggs/build_jec.py
I'll open a new issue for this.
I took care of that. JERCs need to be updated to UL though:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L658-L799
@alejands can you check which are the recommendations?
I had a look and this still needs work.
- NLO corrections.
I noticed that only EWK corrections are implemented:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L290-L305
@ParticleChef can you confirm that this is because samples are already NLO in QCD?
Also, systematic variations need to be implemented. They can be taken from here:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L248-L350
Yes, I know that KIT people generated those samples in NLO QCD so NLO QCD corrections are not applied additionally.
I had a look and this still needs work.
- NLO corrections.
I noticed that only EWK corrections are implemented: https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L290-L305 @ParticleChef can you confirm that this is because samples are already NLO in QCD? Also, systematic variations need to be implemented. They can be taken from here: https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L248-L350
- JERC
This won't work unfortunately: https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L602-L621 We need to implement this: https://github.com/nsmith-/boostedhiggs/blob/master/boostedhiggs/build_jec.py I'll open a new issue for this.
I took care of that. JERCs need to be updated to UL though:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L658-L799
@alejands can you check which are the recommendations?
@alejands ping on this.
I had a look and this still needs work.
- NLO corrections.
I noticed that only EWK corrections are implemented: https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L290-L305 @ParticleChef can you confirm that this is because samples are already NLO in QCD? Also, systematic variations need to be implemented. They can be taken from here: https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py#L248-L350
Yes, I know that KIT people generated those samples in NLO QCD so NLO QCD corrections are not applied additionally.
Good, but I believe we still want systematic variations. I re-implemented those.
Need to fix this:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L475-L487
@mcremone
I'm modifying btagging weight on corrections file with btagging json file. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L35 I checked it works with 2018 efficiency file we produced. But it should be checked if it works on setup of new version and also up and down case should be checked. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L53
I'm modifying btagging weight on corrections file with btagging json file. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L35 I checked it works with 2018 efficiency file we produced. But it should be checked if it works on setup of new version and also up and down case should be checked. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L53
@ParticleChef I strongly suggest you use the btagging weight calculation I implemented in the latest version of corrections.py, there were a lot of things I fixed. Also, the version you have, as well as the current version of corrections.py, won't work with the new hist
format, as I was commenting before. In order to fix that, this part should be changed:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L475-L487
I'm modifying btagging weight on corrections file with btagging json file. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L35 I checked it works with 2018 efficiency file we produced. But it should be checked if it works on setup of new version and also up and down case should be checked. https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L53
Also, on a separate note, I don't know which btageff2018.merged
file you are using here:
https://github.com/ParticleChef/decaf/blob/forBtagw/analysis/utils/correctionsBTseperate.py#L42
If you are using the one was already in decaf
that was obtained with pre-UL samples. If you generated you own using the KIT UL QCD samples, that wouldn't work either because, as of yesterday, the btageff
processor was kind of incorrect. Also, a lot of KIT UL QCD root files are corrupted, and they make you coffea jobs crash. That means that even if you managed to run the incorrect processor over them, you're missing a lot of data that wasn't processed.
I checked the new version of btag weight at correction file on your area today. I will use new version. And the btageff2018.merged
file I used was generated by previous version of btageff.py file. So I tried again and got btageff file with latest version.
I have one question when draw the 2D plot of efficiency. The hist stored in btageff2018.merged is stored with dictionary type that the keys are name of reduced file:
deepflav = hists['deepflav']
print(deepflav)
>>
{'TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8.reduced': Hist(
StrCategory(['loose', 'medium', 'tight'], growth=True),
StrCategory(['pass', 'fail'], growth=True),
IntCategory([0, 4, 5, 6]),
Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]),
Variable([0, 1.4, 2, 2.5]),
storage=Double()) # Sum: 1111061577.0 (1111114875.0 with flow), 'TTToHadronic_TuneCP5_13TeV-powheg-pythia8.reduced': Hist(
StrCategory(['loose', 'medium', 'tight'], growth=True),
StrCategory(['pass', 'fail'], growth=True),
IntCategory([0, 4, 5, 6]),
Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]),
Variable([0, 1.4, 2, 2.5]),
storage=Double()) # Sum: 4569717606.0 (4569910719.0 with flow), 'TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8.reduced': Hist(
StrCategory(['loose', 'medium', 'tight'], growth=True),
StrCategory(['pass', 'fail'], growth=True),
IntCategory([0, 4, 5, 6]),
Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]),
Variable([0, 1.4, 2, 2.5]),
storage=Double()) # Sum: 4890351531.0 (4890566991.0 with flow)}
So it should be used like this:
deepflav = hists['deepflav']['TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8.reduced']
loose_pass = deepflav[{'wp': 'tight', 'btag':'pass'}]
Do you have any idea to merge all dataset?
I checked the new version of btag weight at correction file on your area today. I will use new version. And the
btageff2018.merged
file I used was generated by previous version of btageff.py file. So I tried again and got btageff file with latest version.I have one question when draw the 2D plot of efficiency. The hist stored in btageff2018.merged is stored with dictionary type that the keys are name of reduced file:
deepflav = hists['deepflav'] print(deepflav) >> {'TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8.reduced': Hist( StrCategory(['loose', 'medium', 'tight'], growth=True), StrCategory(['pass', 'fail'], growth=True), IntCategory([0, 4, 5, 6]), Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]), Variable([0, 1.4, 2, 2.5]), storage=Double()) # Sum: 1111061577.0 (1111114875.0 with flow), 'TTToHadronic_TuneCP5_13TeV-powheg-pythia8.reduced': Hist( StrCategory(['loose', 'medium', 'tight'], growth=True), StrCategory(['pass', 'fail'], growth=True), IntCategory([0, 4, 5, 6]), Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]), Variable([0, 1.4, 2, 2.5]), storage=Double()) # Sum: 4569717606.0 (4569910719.0 with flow), 'TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8.reduced': Hist( StrCategory(['loose', 'medium', 'tight'], growth=True), StrCategory(['pass', 'fail'], growth=True), IntCategory([0, 4, 5, 6]), Variable([20, 30, 50, 70, 100, 140, 200, 300, 600, 1000]), Variable([0, 1.4, 2, 2.5]), storage=Double()) # Sum: 4890351531.0 (4890566991.0 with flow)}
So it should be used like this:
deepflav = hists['deepflav']['TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8.reduced'] loose_pass = deepflav[{'wp': 'tight', 'btag':'pass'}]
Do you have any idea to merge all dataset?
The new version of corrections.py
already ingests the new format and merges everything:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L487-L498
I checked quickly that "deepJet_comb" has only 4 and 5 for hadron flavor in json file. So I should use "deepJet_incl" for light sf (hadron flavor 0). I think this also cause the error. It is solved already? https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L498
It depends on what you are loading here:
https://github.com/mcremone/decaf/blob/UL/analysis/utils/corrections.py#L475-L476
Also, which error are you referring to?
I updated the codes and I got another error when compile the corrections.py file.
from correctionlib import convert
has some issue:
[jhong@cmslpc115 analysis]$ python3 utils/corrections.py
Traceback (most recent call last):
File "utils/corrections.py", line 3, in <module>
from correctionlib import convert
File "/uscms/home/jhong/.local/lib/python3.8/site-packages/correctionlib/convert.py", line 19, in <module>
from .schemav2 import (
File "/uscms/home/jhong/.local/lib/python3.8/site-packages/correctionlib/schemav2.py", line 37, in <module>
class Variable(Model):
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-pydantic/1.8/lib/python3.8/site-packages/pydantic/main.py", line 287, in __new__
fields[ann_name] = ModelField.infer(
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-pydantic/1.8/lib/python3.8/site-packages/pydantic/fields.py", line 392, in infer
return cls(
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-pydantic/1.8/lib/python3.8/site-packages/pydantic/fields.py", line 327, in __init__
self.prepare()
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-pydantic/1.8/lib/python3.8/site-packages/pydantic/fields.py", line 432, in prepare
self._type_analysis()
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-pydantic/1.8/lib/python3.8/site-packages/pydantic/fields.py", line 532, in _type_analysis
if issubclass(origin, Tuple): # type: ignore
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/python3/3.8.2-bcolbf/lib/python3.8/typing.py", line 771, in __subclasscheck__
return issubclass(cls, self.__origin__)
TypeError: issubclass() arg 1 must be a class
To address this I have already changed the setup file:
It needed a lot of work, but now the b-tagging class works.
First of all, pull the current most up to date version from the master branch:
https://github.com/mcremone/decaf/blob/master/analysis/utils/corrections.py
then replace the corrections one by one with the ones recommended for UL. Please be mindful of a couple of things:
1) Use 'correctionlib': https://github.com/cms-nanoAOD/correctionlib 2) Use the coffea lookup tools to interface with correctionlib. If need help, ask Nick Smith. 3) Comment on each single correction, including links from where correction files have been taken etc. 4) Clean up the 'analysis/data' folder from non-UL files, keep only the ones that are used. With correctionlib, they should be reduced to just a bunch of json files