Closed sznajder closed 5 years ago
A single Pandas DataFrame cannot represent flattened data with different numbers of values in each event. You'll have to create one DataFrame for electrons, one DataFrame for muons, etc., if you use flatten=True
. It is normal to work with multiple DataFrames—there are many merging options.
You could set flatten=False
to get a Python list of values in each cell. Then a single DataFrame could hold data from different particles because Python lists can have different lengths. The DataFrame method for applying a function to each row is called apply.
However, if you set flatten=False
or do a Pandas apply
, you're just doing a Python for-loop: you gain nothing from compiled functions or vectorization. If you're okay with that (speed is not an issue), you could cut out the middleman and just do a for-loop over the jagged array:
for outer in jagged_array:
for inner in outer:
f(inner)
or similarly with indexes:
for i in range(len(jagged_array)):
for j in range(len(jagged_array[i])):
f(jagged_array[i][j])
or you could get out of awkward array entirely with jagged_array.tolist()
, which turns it into lists of lists. Plain Python lists will be quite a bit faster than doing for loops directly on the jagged array (because the lookup is simpler; less code).
If performance is an issue, you shouldn't use flatten=False
or DataFrame.apply
. Columnar analysis code has a different strategy than rowwise. The best version of my tutorials on these techniques is here.
I have a Root tree containing several branches with different dimensions. I need to make plots of variables in different branches on a event by events basis. I am opening my Tree with Uproot and converting it directly into a Pandas dataframe and I am facing two problems: 1) If I use Flatten=TRUE option and it gives an error because the branches have different dimensions. How can I solve this problem ?
2) I need to make the plot per event. How do I loop over events in a Pandas data frame ? Thanks, Andre