scikit-hep / uproot3

ROOT I/O in pure Python and NumPy.
BSD 3-Clause "New" or "Revised" License
314 stars 67 forks source link

Issue with vector branches generated using RDataFrame #544

Open ChristofSauer opened 2 years ago

ChristofSauer commented 2 years ago

Dear All,

I got access to a ROOT file that was generated making use of ROOT's RDataFrame feature. This file basically contains the constituents for each jet in an even, i.e., for each jet, there is an associated vector. It appears that RDataFrame converts the std::vector<std::vector<float> > type (which I'd expect for this kind of data) to some custom type vector<float,ROOT::Detail::VecOps::RAdoptAllocator<float> >, which apparently is not understood by uproot. If I'm trying to read one of those branches, I receive the following error message

ValueError: cannot interpret branch 'tjetSortClusNormByPt_pt' as a Python type

The corresponding infomration in the tree is

*............................................................................*
*Br   40 :tjetSortClusNormByPt_pt : vector<float,ROOT::Detail::VecOps:       *
*         | :RAdoptAllocator<float> >                                        *
*Entries :  1082892 : Total  Size=  216012422 bytes  File Size  =  195765107 *
*Baskets :      211 : Basket Size=    1632256 bytes  Compression=   1.10     *
*............................................................................*

Is there some way to read this data using uproot?

jpivarski commented 2 years ago

The first thing is to try reading it in Uproot 4 (check your uproot.__version__; I'm guessing you're using Uproot 3 because you posted the issue here). The deserialization code was made a lot more general, and—fingers crossed—it might just work. If not, it will require some investigation...