scikit-hep / uproot3

ROOT I/O in pure Python and NumPy.
BSD 3-Clause "New" or "Revised" License
315 stars 67 forks source link

Reading ROOT tree with jagged arrays into a Dataframe #473

Closed sznajder closed 4 years ago

sznajder commented 4 years ago

Hi, I have a Root tree that I am reading reading into a dataframe and it has many branches with different number of hits and particles per events as printed bellow. If I use the flatten option I get an error when creating the dataframe because of the jagged arrays in the Tree ! Is there any option I can use to create the dataframe that allows to get a flattened arrays dataframe with NaN filled when there's no information available in the arrays ? Cheers, Andre

vh_station TStreamerSTL asjagged(asdtype('>i2'), 10) vh_ring TStreamerSTL asjagged(asdtype('>i2'), 10) vh_sector TStreamerSTL asjagged(asdtype('>i2'), 10) vh_sim_phi TStreamerSTL asjagged(asdtype('>f4'), 10) vh_sim_theta TStreamerSTL asjagged(asdtype('>f4'), 10) vh_sim_eta TStreamerSTL asjagged(asdtype('>f4'), 10) vh_sim_r TStreamerSTL asjagged(asdtype('>f4'), 10) vh_sim_z TStreamerSTL asjagged(asdtype('>f4'), 10) vh_size (no streamer) asdtype('>i4') vu_pt TStreamerSTL asjagged(asdtype('>f4'), 10) vu_phi TStreamerSTL asjagged(asdtype('>f4'), 10) vu_eta TStreamerSTL asjagged(asdtype('>f4'), 10) vu_theta TStreamerSTL asjagged(asdtype('>f4'), 10) vu_q TStreamerSTL asjagged(asdtype('>i2'), 10) vp_pt TStreamerSTL asjagged(asdtype('>f4'), 10) vp_phi TStreamerSTL asjagged(asdtype('>f4'), 10) vp_eta TStreamerSTL asjagged(asdtype('>f4'), 10) vp_theta TStreamerSTL asjagged(asdtype('>f4'), 10) vp_q TStreamerSTL asjagged(asdtype('>i2'), 10) vp_event TStreamerSTL asjagged(asdtype('>i2'), 10) vp_pdgid TStreamerSTL asjagged(asdtype('>i4'), 10) vp_status TStreamerSTL asjagged(asdtype('>i2'), 10) vp_genp TStreamerSTL asjagged(asdtype('>i4'), 10) vp_size (no streamer) asdtype('>i4') ve_event TStreamerSTL asjagged(asdtype('>u8'), 10) ve_run TStreamerSTL asjagged(asdtype('>u4'), 10) ve_lumi TStreamerSTL asjagged(asdtype('>u4'), 10) ve_npv TStreamerSTL asjagged(asdtype('>i4'), 10) ve_size (no streamer) asdtype('>i4')

jpivarski commented 4 years ago

Yes, but only in the new version of Awkward Array, which hasn't been integrated into Uproot yet (working on it). In the meantime, you can do this:

pip install awkward1

This demo has an example of how to load arrays from Uproot into Awkward 1.0:

import awkward1 as ak
arrays = {name: ak.from_awkward0(array)
              for name, array in tree.arrays("*", namedecode="utf-8").items()}
one_array = ak.zip(arrays, depthlimit=1)
df = ak.pandas.df(one_array, how="outer")

What I think you're asking for—NaN values when a subentry exists in one collection but not the other—is an outer join, provided by how="outer". The above is actually a pass-through to the Pandas join function: I create one DataFrame per jagged structure and Pandas merges them according to the specified how.

If you need more control over how Pandas merges them, you could use ak.pandas.dfs to make a list of DataFrames and merge them yourself. (But I'd be interested in which parameters you find that you need, so they can be added to the ak.pandas.df interface.)

jpivarski commented 4 years ago

I'm guessing I can close this? Let me know if I'm wrong.

sznajder commented 4 years ago

Hi, I am trying to add a variable to an existing flat Root tree ( code bellow ) and I’m getting the following error. Traceback (most recent call last): File "addTreeVariable_new.py", line 106, in tree.extend({'NN_VBF':y}) AttributeError: 'TTree' object has no attribute 'extend' Any hints on what is going on ? Cheers, Andre

Open the cloned Tree with UPROOT

with uproot.open(newfilename) as file:

  # Get the cloned Tree
  tree = file[treename]

  # Load the ROOT Tree variabels into dictionary of nd.arrays and format it for NN
  x = tree.arrays(NN_VARS,outputtype=tuple)
  x = np.array(x)
  x = x.T

  # Evalueate the NN on the event
  y = model.predict(x)

  # Write the new array to the Tree using UPROOT
  tree.extend({'NN_VBF':y})
jpivarski commented 4 years ago

The file was opened for reading (uproot.open as opposed to uproot.recreate), and hence the objects that you get from it are read-only objects. I'm hoping to clarify this in Uproot4; the name of the class returned from uproot.open will actually be ReadOnlyFile and there will be more unification between the reading objects (like TTree) and writing objects; at least enough to give a better error message.

One limitation that probably will not go away is that you can only make new files with all new content in Uproot. A general implementation that allows incremental updating would be a major undertaking—the invariants that have to be maintained are not simple; accepting random input and changing it in a valid way is much more difficult to get right than producing a new valid file.

sznajder commented 4 years ago

Hi Jim, In case I use uproot.recreate is ether a way I can clone a an existing Root flat tree and add a new variable to it ? Cheers, Andre

On Jul 2, 2020, at 12:11 PM, Jim Pivarski notifications@github.com wrote:

The file was opened for reading (uproot.open as opposed to uproot.recreate), and hence the objects that you get from it are read-only objects. I'm hoping to clarify this in Uproot4; the name of the class returned from uproot.open will actually be ReadOnlyFile and there will be more unification between the reading objects (like TTree) and writing objects; at least enough to give a better error message.

One limitation that probably will not go away is that you can only make new files with all new content in Uproot. A general implementation that allows incremental updating would be a major undertaking—the invariants that have to be maintained are not simple; accepting random input and changing it in a valid way is much more difficult to get right than producing a new valid file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653065036, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNX7VVMXORP6BNX2J3LRZSPS7ANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

In case I use uproot.recreate is ether a way I can clone a an existing Root flat tree and add a new variable to it ?

Yes, by reading the original data into arrays and writing those arrays the the output. They have to be array types for which writing is supported, which I think is currently numbers and jagged arrays of numbers.

This is different from what ROOT calls a "fast copy," which skips decompression and recompression, so it won't be as fast as doing the same thing in ROOT. However, it does give you the opportunity to resize the baskets to something that can be more efficiently read (hundreds of kilobytes to megabytes per basket), which is something a "fast copy" can't do.

You can set up a loop like

for arrays in tree.iterate(["branches*"], entrysteps=1000000):   # number of entries in each step
    arrays.update(new_arrays())
    output.extend(arrays)
sznajder commented 4 years ago

Hi Jim, I understand your suggestion but I would like not to iterate over the tree. Is there a way to get a dictionary from tree.arrays() in the format branchdict = { var1: type1 , var2: type2 , … } to create a new Tree from the old one using just file[“t"] = uproot.newtree(branchdict) ? The I guess I could fill the new tree in a single shot with the following code :

# Create a new Root file with UPROOT
with uproot.recreate(newfilename) as newfile:

  # Create a new Tree with old Tree contents
  newfile[treename] = uproot.newtree(branchdict)

  # Fill the new Tree with the dictionary of arrays from old tree
  arraysdict =  tree.arrays(NN_VARS,namedecode="utf-8")
  newfile[treename].extend( arraysdict )

Cheers, Andre

On Jul 2, 2020, at 5:30 PM, Jim Pivarski notifications@github.com wrote:

In case I use uproot.recreate is ether a way I can clone a an existing Root flat tree and add a new variable to it ?

Yes, by reading the original data into arrays and writing those arrays the the output. They have to be array types for which writing is supported, which I think is currently numbers and jagged arrays of numbers.

This is different from what ROOT calls a "fast copy," which skips decompression and recompression, so it won't be as fast as doing the same thing in ROOT. However, it does give you the opportunity to resize the baskets to something that can be more efficiently read (hundreds of kilobytes to megabytes per basket), which is something a "fast copy" can't do.

You can set up a loop like

for arrays in tree.iterate(["branches*"], entrysteps=1000000): # number of entries in each step arrays.update(new_arrays()) output.extend(arrays) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653207170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNU6T6V4QIASAI6N7V3RZTU5NANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

I understand your suggestion but I would like not to iterate over the tree.

My suggestion isn't iterating over individual events in the tree; it's iterating over chunks. It's also valid to do it in a single chunk, if everything fits into memory.

Is there a way to get a dictionary from tree.arrays() in the format branchdict = { var1: type1 , var2: type2 , … } to create a new Tree from the old one using just file[“t"] = uproot.newtree(branchdict) ?

That convenience function doesn't exist, but it's a good idea. The closest thing Uproot has is an interpretation on each branch; only asdtype and asjagged interpretations can be written, and asdtype.to_dtype gives you the dtype that you can put into the dict and asjagged.content.to_dtype does the same for the jagged case.

sznajder commented 4 years ago

Hi Jim, I’m having some difficulties figuring out how to extract the tree.arrays variables type using asdtype.to_dtype Could you should me an example. Thanks, Andre

On Jul 2, 2020, at 6:22 PM, Jim Pivarski notifications@github.com wrote:

I understand your suggestion but I would like not to iterate over the tree.

My suggestion isn't iterating over individual events in the tree; it's iterating over chunks. It's also valid to do it in a single chunk, if everything fits into memory.

Is there a way to get a dictionary from tree.arrays() in the format branchdict = { var1: type1 , var2: type2 , … } to create a new Tree from the old one using just file[“t"] = uproot.newtree(branchdict) ?

That convenience function doesn't exist, but it's a good idea. The closest thing Uproot has is an interpretation on each branch; only asdtype and asjagged interpretations can be written, and asdtype.to_dtype gives you the dtype that you can put into the dict and asjagged.content.to_dtype does the same for the jagged case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653226370, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNRHZPUMX6I5TVJSHATRZT3CHANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

Say you have a tree like this:

>>> import uproot
>>> tree = uproot.open("tests/samples/HZZ.root")["events"]
>>> tree.show()
NJet                       (no streamer)              asdtype('>i4')
Jet_Px                     (no streamer)              asjagged(asdtype('>f4'))
Jet_Py                     (no streamer)              asjagged(asdtype('>f4'))
Jet_Pz                     (no streamer)              asjagged(asdtype('>f4'))
Jet_E                      (no streamer)              asjagged(asdtype('>f4'))
Jet_btag                   (no streamer)              asjagged(asdtype('>f4'))
Jet_ID                     (no streamer)              asjagged(asdtype('bool'))
NMuon                      (no streamer)              asdtype('>i4')
Muon_Px                    (no streamer)              asjagged(asdtype('>f4'))
Muon_Py                    (no streamer)              asjagged(asdtype('>f4'))
Muon_Pz                    (no streamer)              asjagged(asdtype('>f4'))
Muon_E                     (no streamer)              asjagged(asdtype('>f4'))
Muon_Charge                (no streamer)              asjagged(asdtype('>i4'))
Muon_Iso                   (no streamer)              asjagged(asdtype('>f4'))
...

You can get the interpretation from a branch, and from that, the dtype:

>>> tree["Muon_Px"].interpretation
asjagged(asdtype('>f4'))
>>> tree["Muon_Px"].interpretation.content
asdtype('>f4')
>>> tree["Muon_Px"].interpretation.content.todtype
dtype('float32')

Since you can select sets of branches with keys and items,

>>> tree.keys(filtername=lambda x: x.startswith(b"Muon_"))
[b'Muon_Px', b'Muon_Py', b'Muon_Pz', b'Muon_E', b'Muon_Charge', b'Muon_Iso']

you can build what you need to pass to the writing interface with a list comprehension:

>>> [(name, branch.interpretation.content.todtype) for name, branch in tree.items(filtername=lambda x: x.startswith(b"Muon_"))]
[(b'Muon_Px', dtype('float32')), (b'Muon_Py', dtype('float32')), (b'Muon_Pz', dtype('float32')), (b'Muon_E', dtype('float32')), (b'Muon_Charge', dtype('int32')), (b'Muon_Iso', dtype('float32'))]

(Or if the writing interface takes types, such as np.int32, rather than dtypes, such as np.dtype(np.int32), you can add a .type to extract this from the dtype. These filtering operations are a little easier to work with in Uproot4; I had forgotten that the Uproot3 version requires a callable function instead of just glob patterns and regexes.)

sznajder commented 4 years ago

Hi Jim, I’m now getting the error bellow. I don’t understand it because when use new tree.show() the new variable NN_VBF is there ! But when I fill according to the code bellow I get the following error:

  # Evalueate the NN on the event
  y = model.predict(x)
  print("y=",y)

  # Fill the new Tree with the dictionary of arrays
  newfile[treename].show()
  branches = list( branchdict.keys() )
  print("branches=",branches)
  for arrays in tree.iterate(branches, entrysteps=10):   # number of entries in each step
    arrays.update({'NN_VBF':y})
    newfile[treename].extend(arrays)

Cheers, Andre

f_lept1_pt (no streamer) asdtype('>f4') f_lept1_eta (no streamer) asdtype('>f4') f_lept1_phi (no streamer) asdtype('>f4') f_lept1_pdgid (no streamer) asdtype('>i4') f_lept2_pt (no streamer) asdtype('>f4') f_lept2_eta (no streamer) asdtype('>f4') f_lept2_phi (no streamer) asdtype('>f4') f_lept2_pdgid (no streamer) asdtype('>i4') f_lept3_pt (no streamer) asdtype('>f4') f_lept3_eta (no streamer) asdtype('>f4') f_lept3_phi (no streamer) asdtype('>f4') f_lept3_pdgid (no streamer) asdtype('>i4') f_lept4_pt (no streamer) asdtype('>f4') f_lept4_eta (no streamer) asdtype('>f4') f_lept4_phi (no streamer) asdtype('>f4') f_lept4_pdgid (no streamer) asdtype('>i4') f_jet1_pt (no streamer) asdtype('>f4') f_jet1_eta (no streamer) asdtype('>f4') f_jet1_phi (no streamer) asdtype('>f4') f_jet2_pt (no streamer) asdtype('>f4') f_jet2_eta (no streamer) asdtype('>f4') f_jet2_phi (no streamer) asdtype('>f4') NN_VBF (no streamer) asdtype('>f4') y= [[0.] [0.] [0.] ... [0.] [0.] [0.]] branches= [b'f_lept1_pt', b'f_lept1_eta', b'f_lept1_phi', b'f_lept1_pdgid', b'f_lept2_pt', b'f_lept2_eta', b'f_lept2_phi', b'f_lept2_pdgid', b'f_lept3_pt', b'f_lept3_eta', b'f_lept3_phi', b'f_lept3_pdgid', b'f_lept4_pt', b'f_lept4_eta', b'f_lept4_phi', b'f_lept4_pdgid', b'f_jet1_pt', b'f_jet1_eta', b'f_jet1_phi', b'f_jet2_pt', b'f_jet2_eta', b'f_jet2_phi', b'NN_VBF'] Traceback (most recent call last): File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/tree.py", line 389, in get return self._branchlookup[name] KeyError: b'NN_VBF'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "addTreeVariableUproot.py", line 112, in for arrays in tree.iterate(branches, entrysteps=10): # number of entries in each step File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/tree.py", line 658, in iterate branches = list(self._normalize_branches(branches, awkward)) File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/tree.py", line 858, in _normalize_branches branch = self.get(word, aliases=aliases) File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/tree.py", line 395, in get raise uproot.rootio._KeyError("not found: {0}\n in file: {1}".format(repr(name), self._context.sourcepath)) KeyError: not found: b'NN_VBF' in file: /Users/sznajder/cernbox/Work/Data/2018ReducedTrees/histos2e2mu_25ns/output_VBF_HToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root

On Jul 2, 2020, at 8:33 PM, notifications@github.com wrote:

Say you have a tree like this:

import uproot tree = uproot.open("tests/samples/HZZ.root")["events"] tree.show() NJet (no streamer) asdtype('>i4') Jet_Px (no streamer) asjagged(asdtype('>f4')) Jet_Py (no streamer) asjagged(asdtype('>f4')) Jet_Pz (no streamer) asjagged(asdtype('>f4')) Jet_E (no streamer) asjagged(asdtype('>f4')) Jet_btag (no streamer) asjagged(asdtype('>f4')) Jet_ID (no streamer) asjagged(asdtype('bool')) NMuon (no streamer) asdtype('>i4') Muon_Px (no streamer) asjagged(asdtype('>f4')) Muon_Py (no streamer) asjagged(asdtype('>f4')) Muon_Pz (no streamer) asjagged(asdtype('>f4')) Muon_E (no streamer) asjagged(asdtype('>f4')) Muon_Charge (no streamer) asjagged(asdtype('>i4')) Muon_Iso (no streamer) asjagged(asdtype('>f4')) ... You can get the interpretation from a branch, and from that, the dtype:

tree["Muon_Px"].interpretation asjagged(asdtype('>f4')) tree["Muon_Px"].interpretation.content asdtype('>f4') tree["Muon_Px"].interpretation.content.todtype dtype('float32') Since you can select sets of branches with keys and items,

tree.keys(filtername=lambda x: x.startswith(b"Muon_")) [b'Muon_Px', b'Muon_Py', b'Muon_Pz', b'Muon_E', b'Muon_Charge', b'Muon_Iso'] you can build what you need to pass to the writing interface with a list comprehension:

[(name, branch.interpretation.content.todtype) for name, branch in tree.items(filtername=lambda x: x.startswith(b"Muon_"))] [(b'Muon_Px', dtype('float32')), (b'Muon_Py', dtype('float32')), (b'Muon_Pz', dtype('float32')), (b'Muon_E', dtype('float32')), (b'Muon_Charge', dtype('int32')), (b'Muon_Iso', dtype('float32'))] (Or if the writing interface takes types, such as np.int32, rather than dtypes, such as np.dtype(np.int32), you can add a .type to extract this from the dtype. These filtering operations are a little easier to work with in Uproot4; I had forgotten that the Uproot3 version requires a callable function instead of just glob patterns and regexes.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653260473, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNQRVSMKOJ77OZ2RRBLRZUKLHANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

This looks like a Python 3 bytestring (bytes) versus regular string (str): b'NN_VBF' versus 'NN_VBF'. Is that it? If so, you can use Python's encode and decode to switch between them.

(The choice to present ROOT strings as bytestrings was a mistake on my part: it's being fixed in the new version.)

sznajder commented 4 years ago

Hi Jim, I have tried both byte string and regular string options in the code bellow and I keep getting a key error on "newfile[treename].extend(arrays)”. Is there a special encoding I should use for Root tree branches/leafs ?

  # Fill the new Tree using the old Tree
  branches = list( branchdict.keys() )[:-1] # excluding the new variable from list of branches
  print("branches=",branches)
  for arrays in tree.iterate(branches):   # number of entries in each step
    arrays.update({'NN_VBF'.encode(‘ascii'):y})
    newfile[treename].extend(arrays)

Traceback (most recent call last): File "addTreeVariableUproot.py", line 113, in newfile[treename].extend(arrays) File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/write/objects/TTree.py", line 138, in extend self._branches[key].newbasket(value[i], i + 1) File "/Users/sznajder/cernbox/Work/PythonVirtualenv/env38/lib/python3.8/site-packages/uproot/write/objects/TTree.py", line 323, in newbasket self = tree.branches[self.revertstring(self._branch.name)] KeyError: ‘NN_VBF’:

Cheers, Andre

On Jul 3, 2020, at 12:17 PM, notifications@github.com wrote:

This looks like a Python 3 bytestring (bytes) versus regular string (str): b'NN_VBF' versus 'NN_VBF'. Is that it? If so, you can use Python's encode and decode to switch between them.

(The choice to present ROOT strings as bytestrings was a mistake on my part: it's being fixed in the new version.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653592640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNQYMHRVP6G3JQOTCLTRZXY7JANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

You're not showing the part where you create the writable TTree with newtree; do you include ‘NN_VBF’ in the list of expected branches for the output tree? The dict passed to extend has to have exactly the same keys as the dict passed to newtree.

If you made the dict for newtree with a list comprehension from the old tree without adding ‘NN_VBF’, then the new tree won't be expecting it. When I was describing how to make the newtree specification using a list comprehension, I meant to also add the new variable to that specification.

sznajder commented 4 years ago

Yes, I am including the new variable when I create the newtree , as you can see in the print bellow when I run the macro. I am sending the code in attachment. Cheers, Andre

f_lept1_pt (no streamer) asdtype('>f4') f_lept1_eta (no streamer) asdtype('>f4') f_lept1_phi (no streamer) asdtype('>f4') f_lept1_pdgid (no streamer) asdtype('>i4') f_lept2_pt (no streamer) asdtype('>f4') f_lept2_eta (no streamer) asdtype('>f4') f_lept2_phi (no streamer) asdtype('>f4') f_lept2_pdgid (no streamer) asdtype('>i4') f_lept3_pt (no streamer) asdtype('>f4') f_lept3_eta (no streamer) asdtype('>f4') f_lept3_phi (no streamer) asdtype('>f4') f_lept3_pdgid (no streamer) asdtype('>i4') f_lept4_pt (no streamer) asdtype('>f4') f_lept4_eta (no streamer) asdtype('>f4') f_lept4_phi (no streamer) asdtype('>f4') f_lept4_pdgid (no streamer) asdtype('>i4') f_jet1_pt (no streamer) asdtype('>f4') f_jet1_eta (no streamer) asdtype('>f4') f_jet1_phi (no streamer) asdtype('>f4') f_jet2_pt (no streamer) asdtype('>f4') f_jet2_eta (no streamer) asdtype('>f4') f_jet2_phi (no streamer) asdtype('>f4') NN_VBF (no streamer) asdtype('>f4')

On Jul 3, 2020, at 2:55 PM, Jim Pivarski notifications@github.com wrote:

You're not showing the part where you create the writable TTree with newtree; do you include ‘NN_VBF’ in the list of expected branches for the output tree? The dict passed to extend has to have exactly the same keys as the dict passed to newtree.

If you made the dict for newtree with a list comprehension from the old tree without adding ‘NN_VBF’, then the new tree won't be expecting it. When I was describing how to make the newtree specification using a list comprehension, I meant to also add the new variable to that specification.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653636364, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNSFPHBXMNQY24JVPKLRZYLRNANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

sznajder commented 4 years ago

Hi Jim, If you could implement a convenience function that returns a dictionary from tree.arrays() in the format branchdict = { var1: type1 , var2: type2 , … } it would be great ! We often need to clone a Root tree and add some new variables… Cheers, Andre

On Jul 2, 2020, at 6:22 PM, notifications@github.com wrote:

I understand your suggestion but I would like not to iterate over the tree.

My suggestion isn't iterating over individual events in the tree; it's iterating over chunks. It's also valid to do it in a single chunk, if everything fits into memory.

Is there a way to get a dictionary from tree.arrays() in the format branchdict = { var1: type1 , var2: type2 , … } to create a new Tree from the old one using just file[“t"] = uproot.newtree(branchdict) ?

That convenience function doesn't exist, but it's a good idea. The closest thing Uproot has is an interpretation on each branch; only asdtype and asjagged interpretations can be written, and asdtype.to_dtype gives you the dtype that you can put into the dict and asjagged.content.to_dtype does the same for the jagged case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/uproot/issues/473#issuecomment-653226370, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMRNRHZPUMX6I5TVJSHATRZT3CHANCNFSM4MH7RU3A.

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

sznajder commented 4 years ago

Hi Jim, I found the bug in my code and now it works. I was mixing encoded and unencoded variables ... Thanks, Andre

On Jul 3, 2020, at 3:00 PM, Andre Sznajder sznajder.andre@gmail.com wrote:

Yes, I am including the new variable when I create the newtree , as you can see in the print bellow when I run the macro. I am sending the code in attachment. Cheers, Andre

f_lept1_pt (no streamer) asdtype('>f4') f_lept1_eta (no streamer) asdtype('>f4') f_lept1_phi (no streamer) asdtype('>f4') f_lept1_pdgid (no streamer) asdtype('>i4') f_lept2_pt (no streamer) asdtype('>f4') f_lept2_eta (no streamer) asdtype('>f4') f_lept2_phi (no streamer) asdtype('>f4') f_lept2_pdgid (no streamer) asdtype('>i4') f_lept3_pt (no streamer) asdtype('>f4') f_lept3_eta (no streamer) asdtype('>f4') f_lept3_phi (no streamer) asdtype('>f4') f_lept3_pdgid (no streamer) asdtype('>i4') f_lept4_pt (no streamer) asdtype('>f4') f_lept4_eta (no streamer) asdtype('>f4') f_lept4_phi (no streamer) asdtype('>f4') f_lept4_pdgid (no streamer) asdtype('>i4') f_jet1_pt (no streamer) asdtype('>f4') f_jet1_eta (no streamer) asdtype('>f4') f_jet1_phi (no streamer) asdtype('>f4') f_jet2_pt (no streamer) asdtype('>f4') f_jet2_eta (no streamer) asdtype('>f4') f_jet2_phi (no streamer) asdtype('>f4') NN_VBF (no streamer) asdtype('>f4')

> On Jul 3, 2020, at 2:55 PM, Jim Pivarski > wrote: > > > You're not showing the part where you create the writable TTree with newtree; do you include ‘NN_VBF’ in the list of expected branches for the output tree? The dict passed to extend has to have exactly the same keys as the dict passed to newtree. > > If you made the dict for newtree with a list comprehension from the old tree without adding ‘NN_VBF’, then the new tree won't be expecting it. When I was describing how to make the newtree specification using a list comprehension, I meant to also add the new variable to that specification. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub , or unsubscribe . > =================================================== Andre Sznajder Professor Associado Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch ===================================================

=================================================== Andre Sznajder Professor Associado
Instituto de Fisica - Dept. DFNAE Universidade do Estado do Rio de Janeiro ( UERJ ) Rio de Janeiro, RJ - Brasil Tel.: (+55)(21)23340608 ramal:24 Fax: (+55)(21)23340483 Email: Andre.Sznajder@cern.ch

jpivarski commented 4 years ago

Okay, good. I think you migth have attached a file in your email, but since it goes through GitHub Issues, the attachment was dropped. I was going to pick up on this on Monday, but I'm glad you figured it out in the meantime.