scikit-hep / root_pandas

A Python module for conveniently loading/saving ROOT files as pandas DataFrames
MIT License
109 stars 35 forks source link

specifying multi-index during read_root #41

Closed naodell closed 1 year ago

naodell commented 7 years ago

This is an awesome tool!

Considering that HEP data typically has variable number of objects and has a natural division of data (an event) it would be nice if this could be taken into account when converting a root file. I think this would be addressed (as suggested in the title) by being able to specify a multi-index when calling read_root. So for instance, I have event-by-event data in a root file and each event has a several vectors which are consistently sized within an event. I would like to specify something like:

df = read_root('my_data.root', columns=['track_pt', 'track_eta', 'track_phi'], index=['event', '__array_index'], flatten=True)

I was able to do this in two steps with just track_pt

In [26]: df = read_root('data/mydata.root', columns = ['event', 'track_pt'], flatten = True)
In [27]: df.index = [df.event, df.__array_index]                                                                                                                        
In [28]: df[:10]                                                                               
                     event   track_pt  __array_index

event __array_index              
3701  0               3701   2.806184              0
      1               3701   2.099216              1
      2               3701   1.563220              2
      3               3701  11.620861              3
      4               3701  -1.000000              4
      5               3701   0.338156              5
      6               3701  -2.725569              6
      7               3701  -0.955589              7
      8               3701   2.592065              8
      9               3701   1.000000              9

At some level this is a quality of life request, but this did not work when I specified additional variables. Additionally it would be nice if you could somehow read by number of events instead of chunk size though maybe that's tricky to implement.

Thanks, Nate

eduardo-rodrigues commented 1 year ago

As explicitly written in the README since a while, root_pandas, and root_numpy on which it depends, has been deprecated and effectively unmaintained for quite a while. We decided to close anthing outstanding as "won't do" and archive the package at this point.