umd-lhcb / lhcb-ntuples-gen

ntuples generation with DaVinci and in-house offline components
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Check current (run 1) global cuts for RDX #69

Closed yipengsun closed 3 years ago

yipengsun commented 3 years ago

The latest RDX cuts are defined at: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/master/include/functor/rdx/cut.h

I've separated all PID cuts into functions so that for MC we can define have different definitions of PID cuts.

Todo:

FYI @manuelfs @Svende @afernez

yipengsun commented 3 years ago

The RDX run 1 cut for D** "skim" is problematic: It currently leaves no candidate in the tree.

Edit: DD is fine. D** is not.

Svende commented 3 years ago

@yipengsun I followed your instructions here: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/fit/fit_inputs.md with step 6 without the uBDT running make rdx-ntuple-run2-oldcut-no-ubdt. I am looking at the output tuples in lhcb-ntuples-gen/gen/rdx-ntuple-run2-oldcut-no-ubdt/ntuple and see in both DD skims: Dst_DD--21_06_30--std--data--2016--md.root & D0_DD--21_06_30--std--data--2016--md.rootevents. So which ones are empty?

yipengsun commented 3 years ago

Ah, sorry, it's not the DD! It's the D**. The empty ones are:

Dst_Dstst--21_06_30--std--data--2016--md.root
D0_Dstst--21_06_30--std--data--2016--md.root
Svende commented 3 years ago

In general we should follow Phoebe's code for the Run 1 selection cuts (redoHistos_D0.C & redoHistos_Dst.C, the flags are defined in AddB.C & AddD0B_temp.C) rather than the Tables listed in the AnaNote, as the latter is not always up-to-date or Greg's rather than Phoebe's slightly different selection.

For the Control Samples the following cuts should be applied: Screen Shot 2021-07-08 at 5 44 51 PM

yipengsun commented 3 years ago

I fixed obvious errors in YAML file, such as missing quotation marks. Unfortunately the YAML is still not working. I'm working on it.

yipengsun commented 3 years ago

I've made several changes to both the YAML file and babymaker to make the run 2 ntuple w/ run 1 cut workflow compile. These changes are:

I can confirm that now the D** subsample ntuple is not empty.

Unfortunately the DD subsample ntuple is now empty. So far, potential problems are:

yipengsun commented 3 years ago

I tried to extrapolate on Svende's existing cuts and now all subsamples are non-empty. Currently cuts are mostly compatible w/ Svende's slide, with the exception on the DD cuts, where the iso_nnkN has an additional IF condition based on Svede's existing code.

@manuelfs I think now it's ready to be reviewed by you.

manuelfs commented 3 years ago

I pulled the changes with the usual

git pull
git submodule update --init --recursive
make install-dep

but something seems quite wrong. If first crashes because of some 0.0f

(.virtualenv) |14:27:25|~/code/lhcb-ntuples-gen$ make rdx-ntuple-run2-oldcut-no-ubdt
workflows/rdx.py rdx-ntuple-run2-oldcut-no-ubdt ntuples/0.9.4-trigger_emulation/Dst_D0-std --debug \
    --mode data_no_mu_bdt -A input_yml:/Users/manuelf/code/lhcb-ntuples-gen/postprocess/rdx-run2/rdx-run2_with_run1_cuts.yml
== Job: rdx-ntuple-run2-oldcut-no-ubdt ==
Working on /Users/manuelf/code/lhcb-ntuples-gen/ntuples/0.9.4-trigger_emulation/Dst_D0-std/Dst_D0--21_04_27--std--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r1_90000000_SEMILEPTONIC.DST.root...
DEBUG: Executing: data_no_mu_bdt.sh /Users/manuelf/code/lhcb-ntuples-gen/ntuples/0.9.4-trigger_emulation/Dst_D0-std/Dst_D0--21_04_27--std--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r1_90000000_SEMILEPTONIC.DST.root /Users/manuelf/code/lhcb-ntuples-gen/postprocess/rdx-run2/rdx-run2_with_run1_cuts.yml 21_08_10--std--data--2016--md
Traceback (most recent call last):
  File "/Users/manuelf/code/lhcb-ntuples-gen/.virtualenv/lib/python3.8/site-packages/lark/lexer.py", line 458, in lex
    yield lexer.next_token(lexer_state, parser_state)
  File "/Users/manuelf/code/lhcb-ntuples-gen/.virtualenv/lib/python3.8/site-packages/lark/lexer.py", line 378, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches 'f' in the current parser context, at line 1 col 30

MAX( IF(iso_bdt1 <= -1.1, 0.0f, iso_nnk1)*(iso_type1 == 3), IF(iso_bd
                             ^
Expected one of: 
    * "->"
    * ">="
    * "||"
    * SLASH
    * "=="
    * RBRACE
    * DOT
    * RPAR
    * MORETHAN
    * "!="
    * "&&"
    * PLUS
    * MINUS
    * STAR
    * LESSTHAN
    * "<="
    * COMMA

Previous tokens: Token('NUMBER', '0.0')

If I remove the f, it still crashes

(.virtualenv) |14:28:06|~/code/lhcb-ntuples-gen$ make rdx-ntuple-run2-oldcut-no-ubdt
workflows/rdx.py rdx-ntuple-run2-oldcut-no-ubdt ntuples/0.9.4-trigger_emulation/Dst_D0-std --debug \
    --mode data_no_mu_bdt -A input_yml:/Users/manuelf/code/lhcb-ntuples-gen/postprocess/rdx-run2/rdx-run2_with_run1_cuts.yml
== Job: rdx-ntuple-run2-oldcut-no-ubdt ==
Working on /Users/manuelf/code/lhcb-ntuples-gen/ntuples/0.9.4-trigger_emulation/Dst_D0-std/Dst_D0--21_04_27--std--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r1_90000000_SEMILEPTONIC.DST.root...
DEBUG: Executing: data_no_mu_bdt.sh /Users/manuelf/code/lhcb-ntuples-gen/ntuples/0.9.4-trigger_emulation/Dst_D0-std/Dst_D0--21_04_27--std--LHCb_Collision16_Beam6500GeV-VeloClosed-MagDown_Real_Data_Reco16_Stripping28r1_90000000_SEMILEPTONIC.DST.root /Users/manuelf/code/lhcb-ntuples-gen/postprocess/rdx-run2/rdx-run2_with_run1_cuts.yml 21_08_10--std--data--2016--md
baby.cpp:6431:27: error: redefinition of 'raw_b_ISOLATION_PX2'
  TTreeReaderValue<float> raw_b_ISOLATION_PX2(reader, "b_ISOLATION_PX2");
                          ^
baby.cpp:6425:27: note: previous definition is here
  TTreeReaderValue<float> raw_b_ISOLATION_PX2(reader, "b_ISOLATION_PX2");
                          ^
baby.cpp:6432:27: error: redefinition of 'raw_b_ISOLATION_PY2'
  TTreeReaderValue<float> raw_b_ISOLATION_PY2(reader, "b_ISOLATION_PY2");
                          ^
baby.cpp:6426:27: note: previous definition is here
  TTreeReaderValue<float> raw_b_ISOLATION_PY2(reader, "b_ISOLATION_PY2");
                          ^
...

and several more such redefinitions

yipengsun commented 3 years ago

Hmm, I think I removed the -U flag for pip a while back, and that may caused Python package not being properly updated. I've added that flag back. Can you pull the latest master and try again? The error message was due to an outdated babymaker.

manuelfs commented 3 years ago

Hum, I think I may have run make install-dep outside nix earlier. I pulled your changes and installed the dependencies inside nix, and now it worked. Thanks!

yipengsun commented 3 years ago

I added Greg's Dst veto tool, but get the following error when processing wrong-sign Pi B0 tree:

TupleB0WSPi.b0....WARNING LoKi::VertexFitter:: _load(): invalid particle StatusCode=701
TupleB0WSPi.b0....WARNING LoKi::VertexFitter:: _load(): the error from _load: StatusCode=701

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

I'll dig into Greg's code and see what's going on.

yipengsun commented 3 years ago

From Greg's original implementation:

  // Mu loop
  for ( auto& bDaughter : bDaughters ) {
    if ( bDaughter->particleID().abspid() == 421 ) theD = bDaughter;
  }
  if ( !theD ) warning() << "NOT FOUND D0 " << endmsg;

So he loops over all B daughters and try to find a D0. This will not work for B0 -> D* decays because B0 has no DIRECT D0 daughter.

I've add some code to return directly in this case so I don't need to worry about this tool segmentation faults on D* trees (although it is useless for D* tree).

The problem then is that we can't have |(m_D* - m_D0)_reco - (m_D* - m_D0)_PDG| = |Δm_reco - Δm_PDG| < 2 for the D* tree, at least not with the branches generated by this tool. However, we may still be able to apply this cut by looking at the reconstructed mass of the D* directly and compute the delta with the PDG mass of D*.

yipengsun commented 3 years ago

Now DaVinci terminates after processing 215 events. There's no error message, but just this warning:

DaVinciInitAlg....SUCCESS Exceptions/Errors/Warnings/Infos Statistics : 0/0/2/0
DaVinciInitAlg....SUCCESS  #WARNINGS   = 1        Message = 'Delta Memory for the event exceeds 3*sigma'
DaVinciInitAlg....SUCCESS  #WARNINGS   = 2        Message = 'Total Memory for the event exceeds 3*sigma'
TupleBminus.b.T...SUCCESS Exceptions/Errors/Warnings/Infos Statistics : 0/0/2/0
TupleBminus.b.T...SUCCESS  #WARNINGS   = 2        Message = 'No convergency has been reached'
TupleBminus.b.T...SUCCESS  #WARNINGS   = 2        Message = 'fit(): failure from _iterate()'
yipengsun commented 3 years ago

It's a return code problem. Now it works in the sense that it doesn't stop at 215th event.

yipengsun commented 3 years ago

For B -> D0 trees, the delta mass is -1, which is the initial value set by the tool, so it's still not working.

Here's how delta mass is computed:

        float deltaMass = -1;
        if ( part->proto()->track()->type() == 3 ) {
          auto* newP = new LHCb::Particle( LHCb::ParticleID( -413 ) );
          parts2VertexDst.push_back( iparts );
          StatusCode sc4 =
              m_pVertexFit->fit( parts2VertexDst, DstVertex, *newP );
          parts2VertexDst.pop_back();
          deltaMass = newP->momentum().M() - theD->momentum().M();
        }
yipengsun commented 3 years ago

The problem then is that we can't have |(m_D* - m_D0)_reco - (m_D* - m_D0)_PDG| = |Δm_reco - Δm_PDG| < 2 for the D* tree...

Indeed, we are already applying this cut offline: https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/fa3da1c18fa5ea4cb9e076a42596d10c204fc5d6/include/functor/rdx/cut.h#L256 https://github.com/umd-lhcb/lhcb-ntuples-gen/blob/fa3da1c18fa5ea4cb9e076a42596d10c204fc5d6/include/functor/rdx/cut.h#L264

yipengsun commented 3 years ago

Regarding the D* veto in D0 TupleTool:

  1. The TupleTool can find particles fine:

    TupleBminus.b.T...WARNING Getting particles from /Event/Phys/StdAllNoPIDsPions with 39 particles
    TupleBminus.b.T...WARNING Getting particles from /Event/Phys/StdNoPIDsUpPions with 6 particles
    TupleBminus.b.T...WARNING Getting particles from Phys/StdNoPIDsVeloPions with 16 particles
    TupleBminus.b.T...WARNING Getting particles from /Event/Phys/StdAllNoPIDsPions with 39 particles
    TupleBminus.b.T...WARNING Getting particles from /Event/Phys/StdNoPIDsUpPions with 6 particles
    TupleBminus.b.T...WARNING Getting particles from Phys/StdNoPIDsVeloPions with 16 particles
  2. In the TupleTool, I added the following debug variables:

        ghostprob = part->proto()->track()->ghostProbability();
        if ( ghostprob > 0.5 ) {
          continue;
        }
        DstGhostOK = true;
        if ( part->proto()->track()->type() == 3 && opening <= 0.994 ) {
          continue;
        }
        DstType3OK = true;
        if ( part->proto()->track()->type() == 4 && opening <= 0.98 ) {
          continue;
        }
        DstType4OK = true;
        if ( part->proto()->track()->type() == 1 && opening <= 0.98 ) {
          continue;
        }
        DstType1OK = true;

    And the main problem is DstType1OK is mostly false, suggesting that these TES locations are mostly for VELO-only particles that fail the opening cut.

So, we are not requiring only LONG tracks (type == 4), and in these Std containers, most tracks are VELO only and they do not pass the opening cut?

yipengsun commented 3 years ago

Adding 2 debug messages:

      if ( m_verbose ) warning() << "Opening (init) is: " << opening << endmsg;

      if ( part->proto()->track()->type() < 5 &&
           !isTrackInDecay( part->proto()->track(), daughtertracks ) ) {
        ghostprob = part->proto()->track()->ghostProbability();
        if ( ghostprob > 0.5 ) {
          continue;
        }
        DstGhostOK = true;
        if ( part->proto()->track()->type() == 3 && opening <= 0.994 ) {
          continue;
        }
        DstType3OK = true;
        if ( part->proto()->track()->type() == 4 && opening <= 0.98 ) {
          continue;
        }
        DstType4OK = true;
        if ( part->proto()->track()->type() == 1 && opening <= 0.98 ) {
          continue;
        }
        DstType1OK = true;

        LHCb::Vertex vtxWithExtraTrack;
        parts2Vertex.push_back( iparts );

        StatusCode sc3 = m_pVertexFit->fit( vtxWithExtraTrack, parts2Vertex );
        parts2Vertex.pop_back();

        opening = getopening( part->proto()->track(), P );
        if ( m_verbose )
          warning() << "Opening (later) is: " << opening << endmsg;

I got the following logs (a snapshot):

TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
TupleBminus.b.T...WARNING Opening (init) is: 0
EventSelector     SUCCESS Reading Event record 86001. Record number within stream 3: 57945
TupleBminusWS.b...WARNING Getting particles from Phys/StdAllNoPIDsPions with 48 particles
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184
TupleBminusWS.b...WARNING Opening (init) is: 0.974184

So the problem is, the opening remained the same for all tracks in a single event. I'll try moving the opening calculation up and see if it helps.

yipengsun commented 3 years ago

After moving the opening line up, it seems to work. Still, I should ask for the source code of this tool from Phoebe to confirm there's no other bugs.

yipengsun commented 3 years ago

I asked Phoebe about her version of the D* veto tool, and she did the same modification as us. Also, she's using the same XML file for BDT input.

Now we confirm we are using this tool correctly.

yipengsun commented 3 years ago

I found Phoebe's old D* veto cut in proc/AddD0B_temp.C:

DstVeto=(abs(Y_ISOLATION_DstWindowDELTAM2-145.45) > 4. && abs(Y_ISOLATION_DstWindowDELTAM-145.45) > 4);
yipengsun commented 3 years ago

@manuelfs I have a cut that is equivalent to Phoebe's:

dst_veto_deltam: double; MIN(ABS(b_ISOLATION_DstWindowDELTAM-145.454), ABS(b_ISOLATION_DstWindowDELTAM2-145.454))
d0_dst_veto_ok: bool; dst_veto_deltam > 4

I applied this to a small local run 1 ntuple that has a total of 1312 candidates. After applying all run 1 offline cuts (except for D* veto in D0), the resulting step-2 ntuple has 504 candidates. Out of the 504 candidates, 79% of the candidates pass the veto cut, 21% failed. This is somewhat different from the number reported by Greg but given the small sample size, and the fact that I am applying identical cut as Phoebe, I think this is implemented correctly.

yipengsun commented 3 years ago

The veto is validated. The procedure and summary is documented at: https://github.com/umd-lhcb/rdx-run2-analysis/blob/master/docs/cuts/Dst_veto_in_D0.md