mats-playground / coffea-benchmarks

4 stars 7 forks source link

Possible bug in Task 8 related to argdistincts #5

Open ingomueller-net opened 4 years ago

ingomueller-net commented 4 years ago

I am pretty sure something is wrong in Task 8. The following line:

        best_dimu = np.abs(dimu.mass - 91.18).argmin()

is supposed to return the indices to the two muons whose mass is closest to 91.18, but I have an example event where dimu contains a dimuon whose mass is closer than the one ~returned by argmin~ you get after resolving best_dimu with argdistincts.

To be able to examine an individual event, I have produced a ROOT file with just that one problematic event, on which I run example8.ipynb:

rooteventselector /path/to/Run2012B_SingleMu.root:* problem.root --recreate -f62519 -l62519

The following is a part of the notebook, to which I have added print statements and comments to show the problem:

        electrons = electrons[trileptons].compact()
        events = df['event'][trileptons]

        print('dimu')

        diele = electrons.distincts()
        dimu = muons.distincts()

        # same for dileptons
        diele = diele[(diele.i0.charge * diele.i1.charge < 0)] # & (diele.mass > 50) & (diele.mass < 160)]
        dimu  = dimu [(dimu.i0.charge  * dimu.i1.charge  < 0)] # & (dimu.mass > 50) & (dimu.mass < 160)]

        print(dimu[events == 14218060].mass[0].tolist())  # entry at position 2 (with value 7.44...) is clostest to 91.18

        #get the dileptons closest to the z-pole
        best_diele = np.abs(diele.mass - 91.18).argmin()
        best_dimu = np.abs(dimu.mass - 91.18).argmin()

        print(best_dimu[events == 14218060])   # prints 2 -- so far so good

        diele_args = electrons.argdistincts()[best_diele].compact()
        dimu_args = muons.argdistincts()[best_dimu].compact()

        print(dimu_args[events == 14218060]) # prints: (0, 3)
        i1, i2 = dimu_args[events == 14218060][0][0]
        print(i1)
        print(i2)
        m1 = muons[events == 14218060][0][i1].p4
        m2 = muons[events == 14218060][0][i2].p4
        m3 = muons[events == 14218060][0][1].p4
        print(m1)
        print(m2)
        print(m3)

        print('min')
        print((m1 + m2).mass) # mass of dilepton (0, 3) = 4.59... = found by argmin
        print((m3 + m2).mass) # mass of dilepton (1, 3) = 7.44... = found manually -- this is closer to 91.18!
        print(abs((m1 + m2).mass - 91.18)) # distance dilepton (0, 3) = found by argmin
        print(abs((m3 + m2).mass - 91.18)) # distance dilepton (1, 3) = found manually -- this is smaller!

The following patch should make the same changes against the repository:

diff --git a/benchmarks/example8.ipynb b/benchmarks/example8.ipynb
index ac792e0..bca53c9 100644
--- a/benchmarks/example8.ipynb
+++ b/benchmarks/example8.ipynb
@@ -169,27 +169,54 @@
     "                    charge=df['Electron_charge'].content\n",
     "                    )\n",
     "        \n",
+    "        print(np.argmax(df['event'] == 14218060))\n",
+    "        #print(df['event'].tolist()[62519])\n",
+    "                \n",
     "        # a few reasonable muon and electron selection cuts\n",
-    "        muons = muons[(muons.pt > 10) & (np.abs(muons.eta) < 2.4)]\n",
-    "        electrons = electrons[(electrons.pt > 10) & (np.abs(electrons.eta) < 2.5)]\n",
+    "        #muons = muons[(muons.pt > 10) & (np.abs(muons.eta) < 2.4)]\n",
+    "        #electrons = electrons[(electrons.pt > 10) & (np.abs(electrons.eta) < 2.5)]\n",
     "        trileptons = (muons.counts + electrons.counts) >= 3\n",
     "        muons = muons[trileptons].compact()\n",
     "        electrons = electrons[trileptons].compact()\n",
+    "        events = df['event'][trileptons]\n",
     "                \n",
+    "        print('dimu')\n",
+    "\n",
     "        diele = electrons.distincts()\n",
     "        dimu = muons.distincts()\n",
     "        \n",
     "        # same for dileptons\n",
-    "        diele = diele[(diele.i0.charge * diele.i1.charge < 0) & (diele.mass > 50) & (diele.mass < 160)]\n",
-    "        dimu = dimu[(dimu.i0.charge * dimu.i1.charge < 0) & (dimu.mass > 50) & (dimu.mass < 160)]\n",
+    "        diele = diele[(diele.i0.charge * diele.i1.charge < 0)] # & (diele.mass > 50) & (diele.mass < 160)]\n",
+    "        dimu  = dimu [(dimu.i0.charge  * dimu.i1.charge  < 0)] # & (dimu.mass > 50) & (dimu.mass < 160)]\n",
     "        \n",
+    "        print(dimu[events == 14218060].mass[0].tolist())  # entry at position 2 (with value 7.44...) is clostest to 91.18\n",
+    "\n",
     "        #get the dileptons closest to the z-pole\n",
     "        best_diele = np.abs(diele.mass - 91.18).argmin()\n",
     "        best_dimu = np.abs(dimu.mass - 91.18).argmin()\n",
     "        \n",
+    "        print(best_dimu[events == 14218060])   # prints 2 -- so far so good\n",
+    "        \n",
     "        diele_args = electrons.argdistincts()[best_diele].compact()\n",
     "        dimu_args = muons.argdistincts()[best_dimu].compact()\n",
     "        \n",
+    "        print(dimu_args[events == 14218060]) # prints: (0, 3)\n",
+    "        i1, i2 = dimu_args[events == 14218060][0][0]\n",
+    "        print(i1)\n",
+    "        print(i2)\n",
+    "        m1 = muons[events == 14218060][0][i1].p4\n",
+    "        m2 = muons[events == 14218060][0][i2].p4\n",
+    "        m3 = muons[events == 14218060][0][1].p4\n",
+    "        print(m1)\n",
+    "        print(m2)\n",
+    "        print(m3)\n",
+    "        \n",
+    "        print('min')\n",
+    "        print((m1 + m2).mass) # mass of dilepton (0, 3) = 4.59... = found by argmin\n",
+    "        print((m3 + m2).mass) # mass of dilepton (1, 3) = 7.44... = found manually -- this is closer to 91.18!\n",
+    "        print(abs((m1 + m2).mass - 91.18)) # distance dilepton (0, 3) = found by argmin\n",
+    "        print(abs((m3 + m2).mass - 91.18)) # distance dilepton (1, 3) = found manually -- this is smaller!\n",
+    "\n",
     "        # select the third lepton with highest pT that's not in the z-candidate\n",
     "        # it returns a mask that selects the appropriate dilepton and third lepton\n",
     "        # the mask is already exclusive across lepton types\n",
ingomueller-net commented 4 years ago

Shame on me, I found it again after 3 minutes. But it is almost certainly a bug.

The problem is that .argdistincts() refers to the dimuons produced by dimu = muons.distincts(), but a few lines later, dimu is actually overwritten and filtered, so the indices returned by .argdistincts() are invalid.