Closed manuelfs closed 1 year ago
In the meeting 7-28-20, I was directed to table 18 of the ANA note for a list of all MC decay modes for the D+ mu sample, but I still have a question for the most sensible way to group background MC. In this image I've color-coded which modes I'm currently grouping together (and included the names for the groups) Instead of this grouping, I'm wondering if it's better to make the plots less complicated and just define "dss" and "dd" processes rather than specifying whether the lepton coming from the B is a muon or a tau. Additionally, related to the picture, I have two more questions: First, probably a question for Phoebe, in her code, she seems to have a histogram for a strange B -> D(D->mu nu X')X decay mode that isn't listed in table 18 of the ANA note. Should I count this mode in my plots? Second, I'm thinking I shouldn't include the "comb", "misID", and "Combinatorial D" as an extra process for the plots, since Phoebe's code (eg. where she fills the misID hist) seems to be claiming that these are data. I could just be misunderstanding this (of course there is combinatorial background in the data, etc), but I was figuring these entries in table 18 referred to something the MC tried to simulate. If I'm not counting these as MC processes, too, should I add them to the "data" event type?
As usual, excellent questions
Instead of this grouping, I'm wondering if it's better to make the plots less complicated and just define "dss" and "dd" processes rather than specifying whether the lepton coming from the B is a muon or a tau.
Sounds very reasonable. At some point we may need access to all individual components (for instance, to validated their shapes against the plots in the note), but you certainly can do any groupings you prefer for these plots. I agree that too many components would make the plot hard to read.
By the way, you have a couple of Bs
meson processes lumped with the dssmu. Those are not D**, so you should put them with the dd, or make their category (they're probably small though).
First, probably a question for Phoebe, in her code, she seems to have a histogram for a strange B -> D(D->mu nu X')X decay mode that isn't listed in table 18 of the ANA note. Should I count this mode in my plots?
I think all the Bs
mesons can go in the dd. We can ask her later but do include them in your plots.
Second, I'm thinking I shouldn't include the "comb", "misID", and "Combinatorial D*" as an extra process for the plots, since Phoebe's code (eg. where she fills the misID hist) seems to be claiming that these are data. I could just be misunderstanding this (of course there is combinatorial background in the data, etc), but I was figuring these entries in table 18 referred to something the MC tried to simulate. If I'm not counting these as MC processes, too, should I add them to the "data" event type?
The entries in the table only refer to the shapes used in the fit. The fit then extracts the normalization under each shape (and a bunch of shape parameters that define our uncertainty on the shapes). These shapes are mostly taken from MC, but those components you mention are taken from data indeed.
Phoebe claimed that this ntuple had everything, and it definitely has the normal data, so it may also have the "wrong sign" and "additional track" data control samples used for "comb", "misID", and "Combinatorial D*". The weights of these events are probably quite different than the weights of the components taken from MC, but these should also be included as Process::Type::background
.
(I apologize for the long post that is an accumulation of my questions; I've waited to ask them in hope I'd get a better understanding of the code over time and there would be less/easier questions...)
Overall, my persistent issue has been that Phoebe's code seems to have a lot more going on than feels like would make sense to copy over, but it's tough to figure out where to draw the line. I've been trying to make as much sense as I can of differences between her code and the ANA note specifications (usually in the form of added conditionals in the code), things in her code that seem inconsistent with one another (that likely arise because she wanted to change one set of histograms but didn't change the code for related histograms), when it's okay for me to ignore things she does using variables I don't have access to (eg. lines 1137-1142 she resets the values of mm2
, El
, and q2
for all MC using variables with a suffix smearres
, which I don't have access to), etc. But, it's been tough for me to keep everything straight or come up with general questions I could use for guidance (instead, I encounter many very specific questions, essentially line-by-line, which wouldn't be very useful to ask, because they would only solve minor confusions). My thought process has been that it makes the most sense for me to try to investigate each of these specific questions on my own, but I think my conclusions are likely to be flawed.
That all being said, I've tried to come up with questions that are as "general" as possible, but ultimately some are fairly specific:
mcweight
that I'm unsure about: she seems to be doing multiple "rounds", specified by the option
variable (that I don't believe I have access to?), which I don't understand the purpose of. There are (many) other calculations being done, most of which I think I can trace through her code for, but if I copy all of these calculations, it's likely going to eat up a lot of my time (and I likely won't finish by this Tuesday, as I had hoped to). Overall, do you have any suggestions, or do you think I really ought to just do every calculation in Phoebe's code related to the weights (mcweight
, used for MC components, and totweight
, used for misID and comb)?h_comb_usb
or h_SB
(I'm not sure where the "sb" comes from, but it's referenced in table 18, and based on other context clues, I narrowed down my guesses to this), but I'm not sure which. I follow h_comb
for general combinatoric background and h_misID
for misID; I am naturally more confident in these guesses.h_D2Smu
which, I think, corresponds to the line B -> D*[->D pi pi]mu nu in table 18, but I was thinking that this component is just the general form of the D mu decays (but specifying D->D* pi pi), and so I wasn't counting it separately to avoid what I figured would be double counting; is this the right idea? Related to these components, too, do you have an idea what ishigher
and mm_mom
might be? That is an excellent and well thought out set of questions. Dealing with code like this is hard, because indeed there a lot going on that we may not care about, and it is hard to know what to care about and what not. But you are drawing that line beautifully well.
Just one note about finding where things are. It is always possible that the code is not self-consistent, but in general the branch definitions should be found in AddB.C, and the variables in redoHistos_Dst.h. For instance, Elsmearres
is actually the Elsmear
branch that you do have access to. For some reason, Phoebe did not name the variable the same as the branch in this case.
muVeto
is defined in AddB.C
as if (piminus0_isMuon || Kplus_isMuon /*|| piminus_isMuon*/) muVeto=true;
, so it is the !isMuon
in Table 6.option
or histograms such as NewMCWeights/Histos/firstround.root
), and we'll ask Phoebe more about this.SB
typically stands for sideband. In this case, it should refer to the sidebands of the Δm distribution for the D (as opposed to the mass peak), where there should only exist combinatorial D candidates. And for instance if you see USB
, it typically would refer to the upper sideband of the B
mass, > 5280 MeV, where the B
combinatorial background lives. So I think you got it all exactly right 🙂 D**τν
component, so their BF weights (probably mcWeight
) are very large. Given that the uncertainly is as large as the spike, that looks like a single event with a very large weight. For now, I would just comment out this component.D**τν
should be separated into 1 and 2 pion components just like the D**μν
components.
2S
histogram I was not aware of, but it looks like it contains higher charm mass 2S states. Since we call D**
everything above the D*
, these would be also D**
, but higher mass than the 4 states we know. mm_mom
is the Δm for the mother of the D and the D. ishigher
is also defined in AddB.C and seems to indeed pick up high-mass charm states. I have to say, I had never heard of IDs 100413 or 100423 (they're not even included in the PDG's MC numbering scheme), but you can extrapolate that they are 2S charm states since they end with 413 (D+) and 423 (D0), and start with 100 like the ηc(2S).This 2013 measurement from LHCb provides a useful illustration of the various charmed states, including the 2S
states. When we say D**
we typically refer to the four 1P
states
These 2018 paper by Bernlochner, Ligeti, and Robinson is also useful. This table also provides the masses and widths for the four 1P
states. And you can see here why the D1
and D*2
are some times referred to as the narrow states
Also, in terms of notation D**
used to refer any charmed meson heavier than the D*
, but I think these days it often refers just to these four states.
Another set of useful numbers to have in mind are the tau/mu ratios for the D**
I'll record a list of questions for Phoebe- that I'm guessing will be added to in the future- here, for now.
As a note, these questions are based on my current understanding of the analysis-- having not read all of the ANA note, it's likely that some/all of the questions could be answered in the text. Also, I haven't implemented all the weight calculations that Phoebe does (which amounts to ~1000 lines of code in the redoHistos_Dst script), and I'm sure I'll inevitably have questions about this in the future.
First, testing Phoebe's memory: is the script, as far as you know, exactly what was used for "final" analysis (ie. it should be exactly implementing what's in the ANA note), or are there things that you changed to get various other/different plots? If the script was modified and what I'm looking at is a sort of snapshot in time, I imagine it would be nearly impossible to remember everything that was changed, but I just want to get a feeling for what I'm looking at.
Now, some questions about what's included for the D components. My understanding is that there are four 1P D states- D0, D1, D1', D2 (but D0 only decays to D, not D). In the code, a histogram (h_D2Smu) is created for 2S D states: should these be included along with the other 1P D components (when comparing the shapes of the sum of MC components to data)?
For the heavy D (ie. D -> D* pi pi), in both the ANA note and the code, I only see semimuonic modes included, no semitauonic. Is this right? Is it just because these components would be negligible because the tau is too massive?
A question about defining control samples: the ANA note indicates there are some p and pT cuts on additional tracks found by the isolation BDT, but there seem to be more of these cuts in the code than in table 15 (pg 49). Also, there seem to be various other cuts in the code that aren't mentioned in the table. Should I just follow the code, or should I ignore extra cuts and follow the ANA note? List of additional cuts- DD: Max(iso_P(iso_PT > 150), iso_P2(iso_PT2 > 150)(iso_BDT2 > -1.1), iso_P3(iso_PT3 > 150)(iso_BDT3 > -1.1))> 5e3 2OS: iso_CHARGE < 100 && Max(iso_P(iso_PT > 150), iso_P2(iso_PT2 > 150)) > 5e3 1OS: (iso_CHARGE)(Dst_ID) < 0 D**: I'm assuming the mass cut in line 1348/1349 is meant to select D sample (subset of 1OS, cut around narrow D states), but the uncommented mass cut there doesn't match table 15 (commented out cut does).
Maybe I'm being fussy, but I'm not sure if there's a reason behind a slight non-uniformity when selecting some D* components: comparing the selection of the (heavy) D2 component to the (heavy) D1', D2* doesn't have a requirement Dststtype==415 while D1' does have a requirement Dststtype==20413. Is there, in fact, a reason the conditions aren't symmetric?
We can consider this done for now.
@afernez As a first step to understand Run 2 data and MC, we will produce a series of data/MC plots using Phoebe's ntuples. Given their size, we will focus first on her 15 GB D*+ sample
The first step is to go through redohistos_Dsp.C and find the cuts to select each component. This is a script that she uses to produce the templates (histograms) that provides the shapes used to extract the yield of each component in the signal fit. For instance,
h_sigtau
is the histogram with theB → D*+ τ ν
signal, so the cuts used to fill it define which events she considers signal. the_v1p
and others and ±1 σ variations, so we don't care about them.I wrote a portion of the script that will be used for this, src/rdx/plot_phoebe_datamc.cxx, with examples of functions to select data, signal, and normalization (with
eventType getType
andNamedFunc event_type
), and give them different weights (withNamedFunc weight
). You'll need to expand on theenum
values, change names appropriately, add the cuts to select each component and sample, and plotm_nu1
(mmiss^2),q2
,El
and any other interesting variables you may like.The variables used in
redohistos_Dsp.C
may be branches of the tree (those with afChain→SetBranchAddress
in redoHistos_Dst.h). In that case you can access them inplot_scripts
in strings (eg"m_nu1>5"
), with their name directly (m_nu1 > 5.
) or insideNamedFunc
definitions (b.m_nu1() > 5.
). Note that after loading the value of the branch into the variable, the variable may be modified, so you should check for that. For instance, the weight is modified byif(mcWeight > 5) { mcWeight=5;}
. If the variable is not a branch you'll have to calculate it.The second step is find the cuts for each of the samples defined in Table 15 of the ANA note.
ISO
is the signal region, andDD
,1OS
,D**
, and2OS
are control samples with no signal. These samples are selected byredoHistos_Dst.C
based on theoption
variable, see here.To check that things are going in the right direction you can try to reproduce the data distribution (choose the same binning) of some of the fit projections in chapter 8 of the ANA note.
Feel free to ask for help and any clarification in this GitHub issue or in Slack. Reading other people's code is not easy! Variable names are not always obvious (for instance
m_nu1
is the missing mass...), and we'll probably have to compile a set of questions to ask Phoebe some time.Good luck!