Open matthewfeickert opened 3 years ago
Hi Matthew,
then produce all the simulated events with the toolchain and then come back and have MoMEMta use that same generation process in combination with MoMEMta-MaGMEE to have MoMEMta know what was done?
That is correct. In MoMEMta it is basically assumed that you already have your events (simulated or data), and that you need to compute weights for them.
In your case it indeed comes down to generating the process in MG5 and writing out the C++ matrix element for MoMEMta on the one hand, and generating events using the MG5 toolchain (madevent, then Pythia or Delphes or whatever), with the same original generate
command in MG5, on the other.
Note that in general there is no relationship between the two: you can compute weights under any process hypothesis, not necesseraly the same that was used to generate your events (in fact, when you compute weights on data, you don't know how the events were produced!). Very often the generator used is also different, i.e. you can very well compute weights under the hypothesis of leading-order ttbar production (with a matrix element coming from MoMEMta-MaGMEE), whereas the events were simulated using powheg at NLO.
Does this clarify things?
I do not have much to add to what Sebastien already explained, except that I never used event generation myself. Instead I use centrally produced CMS samples in NanoAOD format from which I extract my own ntuples. Only after I have them do I produce the matrix elements for the different processes I want to obtain the MEM weights for, and run MoMEMta on these events.
Thanks @swertz and @FlorianBury. You've both been very helpful (truly appreciate it) and this does indeed help. Seems like I'll be on track now. :+1:
Hi again @swertz and @FlorianBury. I have some further questions which I'm hoping will be obvious once I think more about things, but I figured I'd ask in the case that I'm missing something incredibly obvious.
In an effort to try to test the simplest (but uninteresting in reality) case scenario with MoMEMta
, I wanted to test the hypothesis of Drell-Yan against Drell-Yan simulation. I started with the following MadGraph5 configuration (though for what I'll be showing later this used 1e4
events for a faster example)
which I then ran the hepmc
file PYTHIA gave through Delphes
with the ATLAS card
and then did some preprocessing to move from the detector level event information in the Delphes
output ROOT
file to event selection level information where I could have the components of the particle 4-momentum
which resulted in the preprocessing_output_10e4.root
file in the attached: example_files.zip
If I then use the following Drell-Yan hypothesis with the MoMEMta-MaGMEE
plugin
with the preprocessing_output_10e4.root
with the following example C++
script and Lua config I'm able to produce the attached momemta_weights.root
file with
$ git clone https://github.com/scailfin/MadGraph5-simulation-configs.git
$ cd MadGraph5-simulation-configs
$ docker pull neubauergroup/momemta-python-centos:1.0.1
$ docker run --rm -ti -v $PWD:$PWD -w $PWD neubauergroup/momemta-python-centos:1.0.1
[root@ac7e4ff8e23d MadGraph5-simulation-configs]# cd momemta/drell-yan/
[root@ac7e4ff8e23d drell-yan]# bash run_momemta.sh preprocessing_output_10e4.root
This all seems fine. However, I'm having trouble interpreting if the construction of my Lua config to handle the hypothesis is correct as when I look at the distribution of the weights, the distribution is very heavily skewed (and it doesn't appear to be from the small number of events). As the weights are the integral result without normalisation then the spread of the values of the weights are important by themselves but only in the context of comparison to other hypothesis weights. But the distribution's highly peaked nature seems strange.
(venv) $ python -m pip install --upgrade pip setuptools wheel
(venv) $ python -m pip install uproot "hist[plot]" # dependencies for below
from pathlib import Path
import numpy as np
import uproot
from hist import Hist
from matplotlib.figure import Figure
if __name__ == "__main__":
input_file = Path.cwd().joinpath("momemta_weights.root")
tree_path = "momemta"
with uproot.open(f"{input_file}:{tree_path}") as tree:
drell_yan_weight_values = tree["weight_DY"].array()
log_10_weights = -np.log10(drell_yan_weight_values)
hist_drell_yan_weights_log = Hist.new.Reg(
50, 0.0, 10, name="weights", metadata="drell-yan"
).Double()
hist_drell_yan_weights_log.fill(log_10_weights)
fig = Figure()
ax = fig.subplots()
artists = hist_drell_yan_weights_log.plot(
ax=ax, label=f"{len(log_10_weights)} weights"
)
ax.legend(loc="best", frameon=False)
ax.set_xlabel(r"$-\log_{10}\,($Drell-Yan MoMEMta Weights$)$")
ax.set_ylabel("Count")
ax.set_yscale("log")
fig.savefig("drell_yan_weights_log.png")
If you have time, can you look and let me know if I'm doing something wrong with the Lua config? Or am I missing something fundamental about the physics here? (I'll go and refresh myself with your papers of course in the meantime to try to answer this.)
(cc @mihirkatare)
Also @FlorianBury, to try to get a weight plot that would give me the ability to roughly compare to your Drell-Yan hypothesis weights plot for the llbb topology (in Figure 2 of your paper https://arxiv.org/abs/2008.10949) I made a first stab in PR #11 for a:
p p > l+ l- b b~
in MadGraph5As the Lua config that you used is very similar to mine, would you be able to make any spot check comments on what I'm doing wrong it if you have time? With
logging::set_level(logging::level::debug);
I can see there are some errors RE: the transfer function evaluation bits that I'm messing up on (running on branch fix/get-llbb-topology-working
)
16: [2021-08-10 05:50:07.810] [warning] Warnings found during validation of parameters for module GaussianTransferFunctionOnEnergyEvaluator::tfEval_bjet1
17: [2021-08-10 05:50:07.810] [warning] Unexpected parameter: ps_point
18: [2021-08-10 05:50:07.810] [warning] These parameters will never be used by the module, check your configuration file.
19: [2021-08-10 05:50:07.810] [error] Validation of parameters for module GaussianTransferFunctionOnEnergyEvaluator::tfEval_bjet1 failed:
20: [2021-08-10 05:50:07.810] [error] Input not found: gen_particle
21: [2021-08-10 05:50:07.810] [error] Check your configuration file.
22: [2021-08-10 05:50:07.810] [warning] Warnings found during validation of parameters for module GaussianTransferFunctionOnEnergyEvaluator::tfEval_bjet2
23: [2021-08-10 05:50:07.810] [warning] Unexpected parameter: ps_point
24: [2021-08-10 05:50:07.810] [warning] These parameters will never be used by the module, check your configuration file.
25: [2021-08-10 05:50:07.810] [error] Validation of parameters for module GaussianTransferFunctionOnEnergyEvaluator::tfEval_bjet2 failed:
26: [2021-08-10 05:50:07.810] [error] Input not found: gen_particle
27: [2021-08-10 05:50:07.810] [error] Check your configuration file.
28: [2021-08-10 05:50:07.810] [fatal] Validation of modules' parameters failed. Check the log output for more details on how to fix your configuration file.
terminate called after throwing an instance of 'lua::invalid_configuration_file'
what(): Validation of modules' parameters failed. Check the log output for more details on how to fix your configuration file.
Hi Matthew, about your first Drell-Yan example: I've had a look and things look pretty good to me, I could not spot any inconsistency.
The distribution also looks quite reasonable to me. We've always seen such skewed distributions of -log(W). I don't know of any argument that would justify whether those shapes are expected or not. In general, if x ~ p, then the distribution of p(x) (or of -log(p(x)) here, i.e. some "event entropy") is not "universal" and really depends on p in the first place, no?
You can compare with the shapes in pp. 107-108 of this thesis: https://inspirehep.net/files/94258ee627e914a1d48dd1c7e2c9a21e. Although not in the same phase space, the weight distributions all feature a peak and a long skewed tail.
Hi Matthew, about your first Drell-Yan example: I've had a look and things look pretty good to me, I could not spot any inconsistency.
Thanks very much for taking the time to check @swertz — I appreciate it!
The distribution also looks quite reasonable to me. We've always seen such skewed distributions of -log(W). I don't know of any argument that would justify whether those shapes are expected or not. In general, if x ~ p, then the distribution of p(x) (or of -log(p(x)) here, i.e. some "event entropy") is not "universal" and really depends on p in the first place, no?
You can compare with the shapes in pp. 107-108 of this thesis: https://inspirehep.net/files/94258ee627e914a1d48dd1c7e2c9a21e. Although not in the same phase space, the weight distributions all feature a peak and a long skewed tail.
This is all good to hear. The more that I think about it the more this distribution makes sense as the topology that I've invented for the example is just two leptons, and so should be quite clean, and I'm comparing a physics hypothesis that directly matches the generating process for the observations. So having extremely peaked distributions under these conditions seems reasonable — as you have pointed out (though I will admit that I haven't developed more of an intuition about the distributions of the -log(weight_hypothesis) other than the obvious smaller values represent more compatibility between the physics hypothesis and the observations for the given topology).
You are of course also correct in your point on the distribution not being universal.
Also thanks for the link to @BrieucF's thesis! I'll read over it in more depth, but Figure 4.2 and 4.3 are indeed nice references (especially seeing the distributions of simulation for various hypothesis weights). :+1:
Hi @swertz @FlorianBury. I have a question that by definition is going to be pretty dumb RE: using MoMEMta and MoMEMta-MaGMEE that will probably be obvious once I have time to reread the "In depth sections of https://momemta.github.io/ and https://arxiv.org/abs/1805.08555 and https://arxiv.org/abs/2008.10949. (Or maybe I'm confused on the Lua configuration process.)
If this doesn't make, please ask for clarification as I'm writing this Issue somewhat quickly.
How does one practically go from the MadGraph5 process and building the matrix elements with MoMEMta-MaGMEE to producing simulated events to the computation of weights?
The examples that are given in the tutorial repo (c.f.
run_ttbar_tutorial.sh
) start out with provided MatrixElements and a simulated event file to read in (Tutorials/TTbar_FullyLeptonic/tt_20evt.root
). That all works fine, and those simulated events are generated using MG5_aMC@NLO, Pythia and Delphes.However, as a starting example, I'd like to be able to start with just the MadGraph5 process and have MoMEMta-MaGMEE produce the matrix element
if I want to be able to use MoMEMta with these matrix elements it isn't clear to me the in between steps required. Would I need to take that same MadGraph5 generation
and then produce all the simulated events with the toolchain and then come back and have MoMEMta use that same generation process in combination with MoMEMta-MaGMEE to have MoMEMta know what was done?
I think this is probably rambly enough to have lost any clarity, but I guess I'm missing the connecting step of the requirements on event simulation and connecting simulation back to MoMEMta.