umd-lhcb / MuonBDTPid

Muon PID with a uboost BDT (in ROOT 5). Also include code for PID efficiency studies
0 stars 0 forks source link

Problems with 2017/2018 KPiMu, Mu no pT, Proton PIDCalib ntuples #11

Open afernez opened 2 months ago

afernez commented 2 months ago

The ntuples with the added ANNPIDTraining branches needed to calculate Greg's $\mu$ BDT have been produced, but Emily and I have been having trouble generating the actual $\mu$ BDT branch using the code in this repo as was done for 2016. Emily copied the files from lxplus (with a handful of copying errors that she fixed), and I've done some checks on what's been copied over and for now taken over trying to get the $\mu$ BDT to work. Our conclusion seemed to be that there was a file incompatibility due to some different compression, with the ntuples assumedly produced using a new ROOT version, while we are forced to use a modified ROOT 5 with Greg's uBoost implementation (original uBoost paper, just for completeness). However, re-saving the files so that they can be opened using ROOT5, the same workflow as used for 2016 still leads to errors for 2017/2018.

afernez commented 2 months ago

I just committed a few scripts that I used to do the re-saving (to fix file compression issue) and to check the contents of the ntuples. First, regarding the re-saving: this wrapper and this macro can be used to re-copy all the new PIDCalib ntuples -- from the ones stored in /home/public/pidcalib_ntuples/remote on glacier to a new remote-ubdt-only folder. Doing this for one file for each particle type/year, I see that the copied ntuples can be opened using ROOT5:

(.virtualenv) alex@physwkpsc3114ub:~/MuonBDTPid$ root /home/public/pidcalib_ntuples/remote-ubdt-only/KPiMu-2017-MagDown/00227913_00000065_1.pidcalib.root

*** DISPLAY not set, setting it to 10.206.36.79:0.0
  *******************************************
  *                                         *
  *        W E L C O M E  to  R O O T       *
  *                                         *
  *   Version   5.34/38     12 March 2018   *
  *                                         *
  *  You are welcome to visit our Web site  *
  *          http://root.cern.ch            *
  *                                         *
  *******************************************

ROOT 5.34/38 (heads/v5-34-00-patches@v5-34-36-235-gba5b2a7, Jan 01 1980, 00:00:00 on linuxx8664gcc)

CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.

root [0]
Attaching file /home/public/pidcalib_ntuples/remote-ubdt-only/KPiMu-2017-MagDown/00227913_00000065_1.pidcalib.root as _file0...
Warning in <TClass::TClass>: no dictionary for class ROOT::TIOFeatures is available
root [1] _file0->ls()
TFile**         /home/public/pidcalib_ntuples/remote-ubdt-only/KPiMu-2017-MagDown/00227913_00000065_1.pidcalib.root
 TFile*         /home/public/pidcalib_ntuples/remote-ubdt-only/KPiMu-2017-MagDown/00227913_00000065_1.pidcalib.root
  KEY: TDirectoryFile   KSLL_PiPTuple;1 KSLL_PiPTuple
  KEY: TDirectoryFile   KSLL_PiMTuple;1 KSLL_PiMTuple
  KEY: TDirectoryFile   DSt_PiPTuple;1  DSt_PiPTuple
  KEY: TDirectoryFile   DSt_KMTuple;1   DSt_KMTuple
  KEY: TDirectoryFile   DsPhi_KPTuple;1 DsPhi_KPTuple
  KEY: TDirectoryFile   DSt_PiMTuple;1  DSt_PiMTuple
  KEY: TDirectoryFile   DSt_KPTuple;1   DSt_KPTuple
  KEY: TDirectoryFile   Jpsi_MuPTuple;1 Jpsi_MuPTuple
  KEY: TDirectoryFile   Jpsi_MuMTuple;1 Jpsi_MuMTuple
  KEY: TDirectoryFile   DsPhi_KMTuple;1 DsPhi_KMTuple
  KEY: TDirectoryFile   B_Jpsi_MuMTuple;1       B_Jpsi_MuMTuple
  KEY: TDirectoryFile   B_Jpsi_DTF_MuMTuple;1   B_Jpsi_DTF_MuMTuple
  KEY: TDirectoryFile   B_Jpsi_MuPTuple;1       B_Jpsi_MuPTuple
  KEY: TDirectoryFile   B_Jpsi_DTF_MuPTuple;1   B_Jpsi_DTF_MuPTuple
  KEY: TDirectoryFile   DsPhi_MuPTuple;1        DsPhi_MuPTuple
  KEY: TDirectoryFile   DsPhi_MuMTuple;1        DsPhi_MuMTuple
root [2] TTree* t = (TTree*)_file0->Get("Jpsi_MuPTuple/DecayTree")
root [3] t->Print()
******************************************************************************
*Tree    :DecayTree : DecayTree                                              *
*Entries :   101237 : Total =       573541942 bytes  File  Size =  254599839 *
*        :          : Tree compression factor =   2.25                       *
******************************************************************************
*Br    0 :Jpsi_0.50_cc_mult : Jpsi_0.50_cc_mult/I                            *
*Entries :   101237 : Total  Size=     406925 bytes  File Size  =      83153 *
*Baskets :       15 : Basket Size=      28672 bytes  Compression=   4.89     *

...

But there's still an error when running python scripts/apply_ubdt.py --ymlName spec/pidcalib17.yml (tmp yml file created that points to remote-ubdt-only instead of remote, to 2017 instead of 2016, and to a tmp friends-alex folder -- because I can't write in Emily's friends folder):

/home/public/pidcalib_ntuples/remote-ubdt-only/Mu_nopt-2017-MagDown/ -> /home/public/pidcalib_ntuples/friends-alex/Mu_nopt-2017-MagDown/
  trees: Jpsinopt_MuMTuple/DecayTree;23,Jpsinopt_MuMTuple/DecayTree;22,Jpsinopt_MuPTuple/DecayTree;23,Jpsinopt_MuPTuple/DecayTree;22
  AddUBDTBranchPidCalib -i /home/public/pidcalib_ntuples/remote-ubdt-only/Mu_nopt-2017-MagDown/00227899_00000021_1.pidcalib.root -o /home/public/pidcalib_ntuples/friends-alex/Mu_nopt-2017-MagDown/00227899_00000021_1.pidcalib.root -p probe -b UBDT -t Jpsinopt_MuMTuple/DecayTree;23,Jpsinopt_MuMTuple/DecayTree;22,Jpsinopt_MuPTuple/DecayTree;23,Jpsinopt_MuPTuple/DecayTree;22
Warning in <TClass::TClass>: no dictionary for class ROOT::TIOFeatures is available
Working on Jpsinopt_MuMTuple/DecayTree
Done loading input data
Jpsinopt_MuMTuple/DecayTree has 364131 entries
Processing ............................................................ [100%]

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

#0  0x00007ff8936e7e56 in wait4 () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#1  0x00007ff893667447 in do_system () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#2  0x00007ff8965b62cb in TUnixSystem::StackTrace() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#3  0x00007ff8965b84d4 in TUnixSystem::DispatchSignals(ESignals) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#4  <signal handler called>
#5  0x00007ff8957a2043 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#6  0x00007ff8957a20d4 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#7  0x00007ff8957a0229 in TDirectoryFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#8  0x00007ff8957b7b3a in TFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#9  0x000000000041c53b in main ()

===========================================================

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.

===========================================================

#5  0x00007ff8957a2043 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#6  0x00007ff8957a20d4 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#7  0x00007ff8957a0229 in TDirectoryFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#8  0x00007ff8957b7b3a in TFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#9  0x000000000041c53b in main ()

===========================================================

Segmentation fault (core dumped)
sh: 1: 23,Jpsinopt_MuMTuple/DecayTree: not found
sh: 1: 22,Jpsinopt_MuPTuple/DecayTree: not found
sh: 1: 23,Jpsinopt_MuPTuple/DecayTree: not found
sh: 1: 22: not found
  WARNING:   AddUBDTBranchPidCalib -i /home/public/pidcalib_ntuples/remote-ubdt-only/Mu_nopt-2017-MagDown/00227899_00000021_1.pidcalib.root -o /home/public/pidcalib_ntuples/friends-alex/Mu_nopt-2017-MagDown/00227899_00000021_1.pidcalib.root -p probe -b UBDT -t Jpsinopt_MuMTuple/DecayTree;23,Jpsinopt_MuMTuple/DecayTree;22,Jpsinopt_MuPTuple/DecayTree;23,Jpsinopt_MuPTuple/DecayTree;22 did not execute properly!

...
afernez commented 2 months ago

Using this script, I checked

# ... that 2017-2018 KPiMu, P, Mu_noppt PIDCalib samples [copied from lxplus] are ok:
#   - they should have the same tree structure as 2016 for all files
#   - all brs in 2016 should be present in 2017-2018 (ignore any new brs)
#   - all brs in trees should be nonempty and have nonzero mean/std, within reasonable margin from 2016 (slowest check, so don't do for every file)

Running this, only Mu_nopt MagUp 2017 seem to be problematic:

----- Checking Mu_nopt MagUp 2017 -----

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000011_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

**Missing Jpsinopt_MuMTuple in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000011_1.pidcalib.root, but it was found in previous files in this folder??**

**Missing Jpsinopt_MuPTuple in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000011_1.pidcalib.root, but it was found in previous files in this folder??**

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000003_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000013_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000004_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000007_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000002_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000006_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000001_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalibroot?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000010_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000009_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000005_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000012_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Extra trees in /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/00082524_00000008_1.pidcalib.root vs /home/public/pidcalib_ntuples/remote/Mu_nopt-2016-MagUp/00152087_00000055_1.pidcalib.root?

Everything else has the same tree structure as 2016, and there aren't any missing branches or incorrectly filled branches.

afernez commented 2 months ago

For the second comment about trying to re-copy the ntuples and calculate $\mu$ BDT: I named the folder remote-ubdt-only because I was originally trying to only copy over the branches needed for the $\mu$ BDT calculation. This ran into a (different?) error, so I resorted to just copying the entire ntuples. The error pasted above is using the fully copied ntuples.

manuelfs commented 2 months ago

I found that the /home/public/pidcalib_ntuples/remote/Mu_nopt-2017-MagUp/0008* files are openable in ROOT 5, but they don't have the uBDT branches, so they may have been mistakenly copied there.

The other files are not openable in ROOT 5, but the files resaved in /home/public/pidcalib_ntuples/remote/-ubdt-only are openable and browseable interactively.

I wrote a simple script to run on a single file using the binary obtained from the MuonBDTPid/src/AddUBDTBranchRun2.cpp source code

import argparse
import os
import uproot

parser = argparse.ArgumentParser(description="Apply UBDT to PIDCalib ntuple.")
parser.add_argument('-i','--inputNtp', help='Input ntuple')
args = parser.parse_args()

rootFile = uproot.open(args.inputNtp)
trees = [t.replace(";1", "").replace(";6", "").replace(";5", "")  for t in rootFile if "DecayTree" in t]
trees = ",".join(trees)

cmd = f"  ./bin/AddUBDTBranchRun2PidCalib -i {args.inputNtp} -o gen/uBDT_out.root -x weights/weights_run2_no_cut_ubdt.xml -b UBDT -p probe -t {trees}"
retCode = os.system(cmd)

Initially I was running AddUBDTBranchRun2, the executable mentioned in the README, but I had to add a #define PIDCALIB for the proper branch names to be picked up. There was a curious waning in the compilation saying that this was defined already, even if it was not on the source code. Looking at the MakeFile

AddUBDTBranchRun2:

AddUBDTBranchRun2PidCalib: AddUBDTBranchRun2.cpp
    $(COMPILER) $(CXXFLAGS) -DPIDCALIB -o $(BINPATH)/$@ $< $(LINKFLAGS) $(ADDLINKFLAGS)

Phoebe noticed the -DPIDCALIB which defines the flag! Basically we have two executables with the same source code but in AddUBDTBranchRun2PidCalib the PIDCALIB flag is defined and the branch names have the proper prefix.

With this I could run on one file and produced a uBDT output file image

However, the program crashes after saving that like Alex saw above. I noticed that when I open this kind of files interactively and play with the TBrowser, the ROOT session also crashes

root [3] .q

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f81d39eae56 in wait4 () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#1  0x00007f81d396a447 in do_system () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#2  0x00007f81d4a0c2cb in TUnixSystem::StackTrace() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#3  0x00007f81d4a0e4d4 in TUnixSystem::DispatchSignals(ESignals) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#4  <signal handler called>
#5  0x0000000000000000 in ?? ()
#6  0x00007f81d49b1d04 in TList::FindObject(TObject const*) const () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#7  0x00007f81d49b043e in THashList::Remove(TObject*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#8  0x00007f81d31f176a in TTree::~TTree() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libTree.so
#9  0x00007f81d31f1b09 in TTree::~TTree() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libTree.so
#10 0x00007f81d49b0525 in THashList::Delete(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#11 0x00007f81d35995e6 in TDirectoryFile::~TDirectoryFile() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#12 0x00007f81d3599649 in TDirectoryFile::~TDirectoryFile() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#13 0x00007f81d49b0525 in THashList::Delete(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#14 0x00007f81d35952d2 in TDirectoryFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#15 0x00007f81d35acb3a in TFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#16 0x00007f81d4a58101 in (anonymous namespace)::R__ListSlowClose(TList*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#17 0x00007f81d4a5857e in TROOT::CloseFiles() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#18 0x00007f81d4a589e0 in TROOT::EndOfProcessCleanups(bool) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#19 0x00007f81d4a08b40 in TUnixSystem::Exit(int, bool) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#20 0x00007f81d4a1b116 in TApplication::ProcessLine(char const*, bool, int*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#21 0x00007f81d4cf82f7 in TRint::HandleTermInput() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRint.so
#22 0x00007f81d4a0da9b in TUnixSystem::CheckDescriptors() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#23 0x00007f81d4a0f258 in TUnixSystem::DispatchOneEvent(bool) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#24 0x00007f81d4a75064 in TSystem::InnerLoop() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#25 0x00007f81d4a734af in TSystem::Run() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#26 0x00007f81d4a18d5f in TApplication::Run(bool) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#27 0x00007f81d4cf9524 in TRint::Run(bool) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRint.so
#28 0x000000000040213c in main ()
===========================================================

The ROOT version in the file has an intriguing 10 leading 6.24.02

root [2] _file0->GetVersion()
(const Int_t)1062402

Thus, I think these file still have some class or something that is not backwards compatible. We may to create new files saving event per event to make them compatible with ROOT 5.

CoffeeIntoScience commented 2 months ago

I wonder if Manuel's crash is related to carrying these multiple namecycles around (old partial copies)? I'm not positive, but its not entirely impossible.

You can keep from proliferating these keys in your cloning script by a change here https://github.com/umd-lhcb/MuonBDTPid/blob/39bd8e7428308d4db5674da2a8afadee18a5cb0d/scripts/copy_branches_for_ubdt.cpp#L55 to copied->Write("",TObject::kWriteDelete);

CoffeeIntoScience commented 2 months ago

We may to create new files saving event per event to make them compatible with ROOT 5.

TTree::CloneTree is kind-of sort-of already doing this though. There is a way to copy a tree with an empty clone and a loop over events but my understanding is that this is what CloneTree with no arguments already does.

afernez commented 2 months ago

If I call TFile::Write as copied->Write("",TObject::kWriteDelete) like Phoebe suggests, I still see the crash after running over one tree and trying to save (using Manuel's little script):

(.virtualenv) alex@physwkpsc3114ub:~/MuonBDTPid$ python scripts/apply_ubdt_single.py -i /home/public/pidcalib_ntuples/remote-ubdt-only/Mu_nopt-2017-MagDown/00227899_00000021_1.pidcalib.root
Warning in <TClass::TClass>: no dictionary for class ROOT::TIOFeatures is available
Working on Jpsinopt_MuMTuple/DecayTree
Done loading input data
Jpsinopt_MuMTuple/DecayTree has 364131 entries
Processing ............................................................ [100%]

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007fbc81c4ee56 in wait4 () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#1  0x00007fbc81bce447 in do_system () from /nix/store/adxc893j47gxx3xjw403zdf0liiddvw2-glibc-2.32-48/lib/libc.so.6
#2  0x00007fbc84b682cb in TUnixSystem::StackTrace() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#3  0x00007fbc84b6a4d4 in TUnixSystem::DispatchSignals(ESignals) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libCore.so
#4  <signal handler called>
#5  0x00007fbc83d54043 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#6  0x00007fbc83d540d4 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#7  0x00007fbc83d52229 in TDirectoryFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#8  0x00007fbc83d69b3a in TFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#9  0x00000000004196cd in main ()
===========================================================

The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007fbc83d54043 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#6  0x00007fbc83d540d4 in TDirectoryFile::Save() () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#7  0x00007fbc83d52229 in TDirectoryFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#8  0x00007fbc83d69b3a in TFile::Close(char const*) () from /nix/store/i89sm6q0hh7y9iaaqf69afxrynrqmn7p-root-5.34.38/lib/libRIO.so
#9  0x00000000004196cd in main ()
===========================================================

Segmentation fault (core dumped)
sh: 1: 23,Jpsinopt_MuPTuple/DecayTree: not found
sh: 1: 23: not found

However, if I open the generated friend tree with the ubdt branch in ROOT5, it doesn't seem to crash when closing like it did for Manuel

(.virtualenv) alex@physwkpsc3114ub:~/MuonBDTPid$ root -l gen/uBDT_out.root
root [0]
Attaching file gen/uBDT_out.root as _file0...
root [1] TTree* t = (TTree*)_file0->Get("Jpsinopt_MuMTuple/DecayTree")
root [2] t->Print()
******************************************************************************
*Tree    :DecayTree : DecayTree                                              *
*Entries :   364131 : Total =         7307401 bytes  File  Size =    2433962 *
*        :          : Tree compression factor =   3.00                       *
******************************************************************************
*Br    0 :probe_UBDT : probe_UBDT/F                                          *
*Entries :   364131 : Total  Size=    1461530 bytes  File Size  =     478944 *
*Baskets :       46 : Basket Size=      32000 bytes  Compression=   3.05     *
*............................................................................*
*Br    1 :runNumber : runNumber/D                                            *
*Entries :   364131 : Total  Size=    2922658 bytes  File Size  =      30300 *
*Baskets :       92 : Basket Size=      32000 bytes  Compression=  96.39     *
*............................................................................*
*Br    2 :eventNumber : eventNumber/D                                        *
*Entries :   364131 : Total  Size=    2922850 bytes  File Size  =    1922187 *
*Baskets :       92 : Basket Size=      32000 bytes  Compression=   1.52     *
*............................................................................*
root [3] TBrowser b
root [4] .q
(.virtualenv) alex@physwkpsc3114ub:~/MuonBDTPid$
manuelfs commented 2 months ago

I tried saving the ntuples event-per-event, and that seems to work for ROOT 5, though for some reason I still a bunch of warnings

Error in <TList::Clear>: A list is accessing an object (0x12d88a0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x2c4ba90) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x2c4bec0) already deleted (list name = TList)
...

I committed the c++ and python scripts to do the resaving in pidcalib2, and documented it. I am currently running on the 2017 and 2018 ntuples in parallel with

nix develop
make
./scripts/resave_all_pidcalib_ntuples.py -i /home/public/pidcalib_ntuples/remote -t 17 -o /home/public/pidcalib_ntuples/remote/resaved/
./scripts/resave_all_pidcalib_ntuples.py -i /home/public/pidcalib_ntuples/remote -t 18 -o /home/public/pidcalib_ntuples/remote/resaved/

It took 8.5h for the KPiMu-2017-MagUp folder, which is 1.3T, so I estimate ~33h for each year (about 5T each), so the ntuples should be ready some time tomorrow.

lmeyergarcia commented 2 months ago

I tried saving the ntuples event-per-event, and that seems to work for ROOT 5, though for some reason I still a bunch of warnings

Error in <TList::Clear>: A list is accessing an object (0x12d88a0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x2c4ba90) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x2c4bec0) already deleted (list name = TList)
...

I have seen this before (https://root-forum.cern.ch/t/issue-opening-root-file-generated-with-v6-30-on-v6-24/59148). There was a change in root v6.30 that broke compatibility with previous versions, so if you produce a root file with root version >=6.30 and then read it with <6.30, you get these errors . If that's the case, setting file->SetBit(TFile::k630forwardCompatibility) when producing the file should eliminate the errors.

manuelfs commented 2 months ago

Thanks, but what really confuses me is that I see those warnings when opening in 6.24 the new ntuples that I created and filled event-per-event in 6.24.

|09:12:10|~/code/pidcalib2$ root gen/00227903_00000063_1.pidcalib_resaved.root 
   ------------------------------------------------------------------
  | Welcome to ROOT 6.24/02                        https://root.cern |
  | (c) 1995-2021, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for macosx64 on Jun 28 2021, 09:28:51                      |
  | From tags/v6-24-02@v6-24-02                                      |
  | With clang version 7.1.0 (tags/RELEASE_710/final)                |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
   ------------------------------------------------------------------

root [0] 
Attaching file gen/00227903_00000063_1.pidcalib_resaved.root as _file0...
Error in <TList::Clear>: A list is accessing an object (0x6000027157a0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715880) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715960) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715a40) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715b20) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715dc0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002715ea0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x6000027164c0) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002716760) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002716840) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002716920) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002716a00) already deleted (list name = TList)
Error in <TList::Clear>: A list is accessing an object (0x600002716ae0) already deleted (list name = TList)
(TFile *) 0x7fb95991d5f0
root [1] _file0->GetVersion()
(int) 62402

In principle I just filled a bunch of doubles, ints, and a boolean, but it is like I also brought in a hidden structure from 6.30 into the 6.24 ntuple.

afernez commented 1 month ago

With Manuel's latest effort to re-save the 2017/2018 PIDCalib tuples by building the trees from scratch (saved in /home/manuelf/code/MuonBDTPid/pidcalib_ntuples/remote/resaved on glacier [maybe resaved_kWriteDelete?]) and then create the friend tuples with the ubdt branch, I've been able to successfully run over all 2017+2018 remote+friend ntuples and produce eff histograms. The eff histos were produced without any errors, but they all have bins with nan's, so just as Yipeng did in this issue, we wanted to look at the actual bin content to see how many nan's there were. I've done this for all the particle types (K, Pi, Mu, Mu_nopt, P), but since the tables are large, I'll only include the bin content for 2017 mag down Mu_nopt here (the other particle types/polarities/years have similar nan's).

The eff histos are 3D in Brunel_P, Brunel_ETA, and nTracks_Brunel; I'll show the bin content projected onto Brunel_P, Brunel_ETA and then Brunel_ETA, nTracks_Brunel. The tables include under- (U) and over- (O) flow bins, and the values listed in parantheses (without units) are the bin centers. I don't include the errors for the effs, but spot-checking a few values, they seem reasonable (maybe a TODO to check all the errors systematically).

2017 MD Mu_nopt

P \ eta 0 (U) 1 (1.9) 2 (2.8) 3 (3.7) 4 (4.6) 5 (5.2) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1550.0) 0 nan nan nan nan nan 0
2 (4500.0) 0 4.53 4.59 4.85 -20.49 3.03 0
3 (7000.0) 0 4.53 4.51 4.53 6.07 0.43 0
4 (10000.0) 0 4.47 4.45 4.4 4.02 3.39 0
5 (13250.0) 0 4.55 4.47 4.38 4.39 1.5 0
6 (16000.0) 0 4.66 4.53 4.4 4.39 3.48 0
7 (19500.0) 0 4.67 4.56 4.42 4.16 3.21 0
8 (24250.0) 0 4.7 4.6 4.38 4.1 3.05 0
9 (29500.0) 0 4.78 4.64 4.41 4.1 4.81 0
10 (36000.0) 0 4.77 4.65 4.41 4 3.66 0
11 (50000.0) 0 4.76 4.64 4.4 3.86 3.44 0
12 (65000.0) 0 4.75 4.72 4.38 3.71 3.33 0
13 (85000.0) 0 4.87 4.7 4.44 3.68 3.79 0
14 (150000.0) 0 4.4 4.72 4.47 3.85 2.65 0
15 (O) 0 0 0 0 0 0 0
eta \ nTracks 0 (U) 1 (25.0) 2 (125.0) 3 (250.0) 4 (400.0) 5 (5250.0) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1.9) 0 nan 12.53 12.14 19.87 nan 0
2 (2.8) 0 nan 12.3 12.07 11.72 11.2 0
3 (3.7) 0 nan 12.3 11.65 11.3 nan 0
4 (4.6) 0 nan nan 10.5 11 nan 0
5 (5.2) 0 nan nan nan nan nan 0
6 (O) 0 0 0 0 0 0 0

2017 MD Mu_nopt (uBDT veto)

P \ eta 0 (U) 1 (1.9) 2 (2.8) 3 (3.7) 4 (4.6) 5 (5.2) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1550.0) 0 nan nan nan nan nan 0
2 (4500.0) 0 0.24 0.28 0.29 -2.1 -38.93 0
3 (7000.0) 0 0.3 0.4 0.5 1.17 -3.14 0
4 (10000.0) 0 0.39 0.45 0.51 0.68 0.63 0
5 (13250.0) 0 0.35 0.39 0.45 0.64 0.34 0
6 (16000.0) 0 0.26 0.33 0.44 0.76 0.54 0
7 (19500.0) 0 0.27 0.33 0.44 0.69 0.29 0
8 (24250.0) 0 0.23 0.29 0.44 0.66 0.45 0
9 (29500.0) 0 0.17 0.24 0.4 0.68 1.09 0
10 (36000.0) 0 0.17 0.24 0.38 0.79 1.31 0
11 (50000.0) 0 0.18 0.25 0.4 0.9 1.61 0
12 (65000.0) 0 0.17 0.2 0.42 1.04 1.37 0
13 (85000.0) 0 0.12 0.19 0.42 1.05 1.36 0
14 (150000.0) 0 0.33 0.17 0.36 0.85 -0.53 0
15 (O) 0 0 0 0 0 0 0
eta \ nTracks 0 (U) 1 (25.0) 2 (125.0) 3 (250.0) 4 (400.0) 5 (5250.0) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1.9) 0 nan 0.66 -4.01 0.79 nan 0
2 (2.8) 0 nan 0.42 0.65 0.96 1.48 0
3 (3.7) 0 nan 0.64 0.97 1.44 nan 0
4 (4.6) 0 nan nan 1.92 2.61 nan 0
5 (5.2) 0 nan nan nan nan nan 0
6 (O) 0 0 0 0 0 0 0

The nan's are all at the binning boundaries, just like Yipeng saw for 2016. His solution was to set the nan bins to 0 (along with shifting negative [or >1] effs). I'll do the same for 2017/2018. With this, the reset eff 2D projections become

SHIFTED 2017 MD Mu_nopt

P \ eta 0 (U) 1 (1.9) 2 (2.8) 3 (3.7) 4 (4.6) 5 (5.2) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1550.0) 0 0.84 0 0.34 0 0 0
2 (4500.0) 0 4.53 4.59 4.65 2.55 1.72 0
3 (7000.0) 0 4.53 4.51 4.53 3.65 2.17 0
4 (10000.0) 0 4.47 4.45 4.4 4.02 2.15 0
5 (13250.0) 0 4.55 4.47 4.38 4.23 2.53 0
6 (16000.0) 0 4.66 4.53 4.4 4.28 2.79 0
7 (19500.0) 0 4.67 4.56 4.42 4.16 2.73 0
8 (24250.0) 0 4.7 4.6 4.38 4.1 2.97 0
9 (29500.0) 0 4.78 4.64 4.41 4.1 2.92 0
10 (36000.0) 0 4.77 4.65 4.41 4 3.32 0
11 (50000.0) 0 4.75 4.64 4.4 3.86 3.32 0
12 (65000.0) 0 4.74 4.72 4.38 3.71 3.19 0
13 (85000.0) 0 4.81 4.7 4.44 3.68 3.62 0
14 (150000.0) 0 4.37 4.72 4.47 3.85 2.41 0
15 (O) 0 0 0 0 0 0 0
eta \ nTracks 0 (U) 1 (25.0) 2 (125.0) 3 (250.0) 4 (400.0) 5 (5250.0) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1.9) 0 12.41 12.67 12.14 12.42 11.52 0
2 (2.8) 0 12.49 12.3 12.07 11.72 11.2 0
3 (3.7) 0 12.23 12.35 11.65 11.28 10.49 0
4 (4.6) 0 10.97 11.23 10.43 9.18 8.4 0
5 (5.2) 0 8.94 9.67 7.81 5.9 3.52 0
6 (O) 0 0 0 0 0 0 0

SHIFTED 2017 MD Mu_nopt (uBDT veto)

P \ eta 0 (U) 1 (1.9) 2 (2.8) 3 (3.7) 4 (4.6) 5 (5.2) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1550.0) 0 0.8 0 0 0 0 0
2 (4500.0) 0 0.24 0.28 0.3 1.37 1.38 0
3 (7000.0) 0 0.3 0.4 0.5 1.2 1.33 0
4 (10000.0) 0 0.39 0.45 0.51 0.68 0.92 0
5 (13250.0) 0 0.35 0.39 0.45 0.64 0.77 0
6 (16000.0) 0 0.26 0.33 0.44 0.76 0.74 0
7 (19500.0) 0 0.27 0.33 0.44 0.69 0.44 0
8 (24250.0) 0 0.23 0.29 0.44 0.66 0.65 0
9 (29500.0) 0 0.17 0.24 0.4 0.68 1.17 0
10 (36000.0) 0 0.17 0.24 0.38 0.79 1.17 0
11 (50000.0) 0 0.18 0.25 0.4 0.9 1.58 0
12 (65000.0) 0 0.17 0.2 0.42 1.04 1.38 0
13 (85000.0) 0 0.12 0.19 0.42 1.05 1.38 0
14 (150000.0) 0 0.33 0.17 0.36 0.85 0.69 0
15 (O) 0 0 0 0 0 0 0
eta \ nTracks 0 (U) 1 (25.0) 2 (125.0) 3 (250.0) 4 (400.0) 5 (5250.0) 6 (O)
0 (U) 0 0 0 0 0 0 0
1 (1.9) 0 0.23 0.73 1.1 0.79 1.15 0
2 (2.8) 0 0.25 0.42 0.65 0.96 1.48 0
3 (3.7) 0 0.4 0.64 0.97 1.44 2.05 0
4 (4.6) 0 1.31 1.32 1.93 2.72 4.03 0
5 (5.2) 0 1.12 2.08 3 3.25 4.14 0
6 (O) 0 0 0 0 0 0 0

I think this all seems alright, so then things have gotten to a point that we're capable of using the 2017/2018 Mu_nopt, KPiMu, P PIDCalib production to produce the eff histos we (rdx and rjpsi) need (minus the electron samples that we would still ideally like...). I'll email the people who produced the PIDCalib tuples to tell them that they can get rid of the files on eos (since we've saved them to glacier).

As a final task, I think we should clean things up on glacier, especially since these are large files and we only have so much disk space: