Pandas Dataframe and jagged arrays in different branches

A single Pandas DataFrame cannot represent flattened data with different numbers of values in each event. You'll have to create one DataFrame for electrons, one DataFrame for muons, etc., if you use flatten=True. It is normal to work with multiple DataFrames—there are many merging options.

You could set flatten=False to get a Python list of values in each cell. Then a single DataFrame could hold data from different particles because Python lists can have different lengths. The DataFrame method for applying a function to each row is called apply.

However, if you set flatten=False or do a Pandas apply, you're just doing a Python for-loop: you gain nothing from compiled functions or vectorization. If you're okay with that (speed is not an issue), you could cut out the middleman and just do a for-loop over the jagged array:

for outer in jagged_array:
    for inner in outer:
        f(inner)

or similarly with indexes:

for i in range(len(jagged_array)):
    for j in range(len(jagged_array[i])):
        f(jagged_array[i][j])

or you could get out of awkward array entirely with jagged_array.tolist(), which turns it into lists of lists. Plain Python lists will be quite a bit faster than doing for loops directly on the jagged array (because the lookup is simpler; less code).

If performance is an issue, you shouldn't use flatten=False or DataFrame.apply. Columnar analysis code has a different strategy than rowwise. The best version of my tutorials on these techniques is here.

scikit-hep / uproot3

Pandas Dataframe and jagged arrays in different branches #322