not able to create pandas dataframe

scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.

BSD 3-Clause "New" or "Revised" License

239 stars 76 forks source link

I am creating a pandas dataframe but issue is its giving me tuple each time.

AttributeError Traceback (most recent call last) /tmp/ipykernel_557/1180174254.py in 2 import pandas as pd 3 ----> 4 dfall.columns

AttributeError: 'tuple' object has no attribute 'columns'

Here is the code that I run for making pandas dataframe.

filename = "/eos/cms/store/group/phys_higgs/cmshww/amassiro/HWWNano/Summer20UL18_106x_nAODv9_Full2018v9/MCl1loose2018v9__MCCorr2018v9NoJERInHorn__MCCombJJLNu2018/nanoLatino_GluGluToWWToQQ_Sig_private__part9.root" file = uproot.open(filename) # show what is inside the root file loaded from uproot print(file.classnames()) print(file.keys()) tree = file["Events"] # select the TTree inside the root file tree.show() # show all the branches inside the TTree dfall = tree.arrays(library="pd") # convert uproot TTree into pandas dataframe #dfall.columns print("type of dfall", type(dfall)) print("============================================") print("File loaded with ", len(dfall), " events ")

Thanks, Sadhana

Uproot 4.x tries to "explode" ragged data, so that an array of variable numbers of particles per event are turned into a DataFrame with numbers in the cells and MultiIndex rows, indicating the nesting, similar to ak.to_dataframe.

But this isn't always possible. If you are trying to read, for instance, both muons and electrons, the numbers of particles in these two collections are not in general (or even usually) equal to each other, so there's no single MultiIndex that they can both expand to. In that case, Uproot 4.x produces a tuple of DataFrames, one for each particle type.

Uproot 5.x, however, uses Akimbo to put lists of numbers into each cell (instead of individual numbers) with a normal index. That's because dataframe libraries are starting to use Arrow format, rather than Python lists, and it's not a big performance loss to do so.

So you have two options: 1. without installing any new packages, select (with expressions or filter_names) a single type of particle from the ROOT file. You can do additional calls to get the other particle types into other DataFrames. Or 2. upgrade to the latest version of Uproot.

scikit-hep / uproot5

not able to create pandas dataframe #1329

I am creating a pandas dataframe but issue is its giving me tuple each time.