key4hep / EDM4hep

Generic event data model for HEP collider experiments
https://cern.ch/edm4hep
Apache License 2.0
25 stars 35 forks source link

Edm4hep files contaning vertex collections incompatible with root RDataFrame.Describe() #262

Open Zehvogel opened 7 months ago

Zehvogel commented 7 months ago

root's RDataFrame .Describe() feature fails with edm4hep files containing vertex collections with the following error message:

cppyy.gbl.std.runtime_error: ROOT::RDF::RDFDescription ROOT::RDF::RInterfaceBase::Describe() =>
    runtime_error: TTree leaf BuildUpVertices.covMatrix[6] has both a leaf count and a static length. This is not supported.

It is maybe not our fault but I wanted to submit the issue here first

tmadlener commented 7 months ago

This looks like a root issue, where the minimal reproducer should be to effectively write a branch with an std::array. It could also be that it fails only if the std::array is part of a struct as that is what it is in our case. (And then the branch is actually a vector and not just single elments)

Zehvogel commented 7 months ago

Looks like a reproducer is not so easy to produce...

I made a struct

struct test {
    std::array<float, 6> arr;
    ROOT::VecOps::RVec<float> vec;
    ROOT::VecOps::RVec<ROOT::VecOps::RVec<float>> vecvec;
    std::array<std::array<float, 6>,1> arrarr;
    ROOT::VecOps::RVec<std::array<float,6>> vecarr;
}

and use it like this:

import ROOT

df = ROOT.RDataFrame(10)

code = """
struct test {
    std::array<float, 6> arr;
    ROOT::VecOps::RVec<float> vec;
    ROOT::VecOps::RVec<ROOT::VecOps::RVec<float>> vecvec;
    std::array<std::array<float, 6>,1> arrarr;
    //ROOT::VecOps::RVec<std::array<float,6>> vecarr;
};
"""
ROOT.gInterpreter.Declare(code)

df = df.Define("test", "test()")
print(df.Describe())
df.Snapshot("test_tree", "test_file.root")
df2 = ROOT.RDataFrame("test_tree", "test_file.root")
print(df2.Describe())

returning:

[---]
Column  Type    Origin
------  ----    ------
test    test    Define
[---]
Column                  Type                                            Origin
------                  ----                                            ------
arr[6]                  array<float,6>                                  Dataset
arrarr[1][6]            array<array<float,6>,1>                         Dataset
test                    test                                            Dataset
test.arr[6]             ROOT::VecOps::RVec<Float_t>                     Dataset
test.arrarr[1][6]       ROOT::VecOps::RVec<Float_t>                     Dataset
test.vec                ROOT::VecOps::RVec<float>                       Dataset
test.vecvec             ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Dataset
vec                     ROOT::VecOps::RVec<float>                       Dataset
vecvec                  ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Dataset
[---]

uncommenting the vecarr member fails when writing to disk complaining about not having a dictionary:

Error in <TStreamerInfo::Build>: The class "test" is interpreted and for its data member "vecarr" we do not have a dictionary for the collection "ROOT::VecOps::RVec<array<float,6> >". Because of this, we will not be able to read or write this data member.
tmadlener commented 7 months ago

Ah, sorry. That was a bit ambiguous from my side. What we write is more like this

struct test {
  std::array<float, 6> values;
};

std::vector<test> data;

So probably something along the lines of

import ROOT
ROOT.gInterpreter.Declare("""
#include <array>
#include <vector>

struct test {
  std::array<float, 6> values{};
};

std::vector<test> makeTestData() {
  return std::vector<test>(10);
}
"""

# all the rest

If you want to make it slightly more complicated, this would also be allowed in a podio generated EDM

struct ArrayStruct {
  std::array<int, 42> arr;
};

struct SomeData {
  ArrayStruct a{};
};
Zehvogel commented 7 months ago

Ok, thx. With your construct (the simpler one) it still complains about not having a dictionary but in another way.

Error in <TTree::Branch>: The class requested (vector<test>) for the branch "test_col" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vector<test>) to avoid to write corrupted data.
tmadlener commented 7 months ago

Ah right, you will probably have to create a dictionary then for at least test then. I am not sure if there is an easy way to get that done via the python bindings, or whether it is necessary to bring in some heavier (CMake) machinery.

Zehvogel commented 7 months ago

I have no idea either, guess I will have to postpone this for now as there is more important stuff to do...

wdconinc commented 6 months ago

@Zehvogel I'll file an issue at root-project/root and link here.

wdconinc commented 6 months ago

Back reference: https://github.com/root-project/root/issues/14790

jmcarcell commented 6 months ago

I've had a look at this and it seems it's only an issue with .Describe(). Working with std::array works fine, accessing different indexes, creating new variables, filtering... So a storage type change is not needed.

wdconinc commented 6 months ago

This affects more than just .Describe(). It prevents also for example df.AsNumpy(['BuildUpVertices.covMatrix']). In fact, I think it prevents any access to values stored in a std::array from python.