scikit-hep / uproot3

ROOT I/O in pure Python and NumPy.
BSD 3-Clause "New" or "Revised" License
314 stars 67 forks source link

Cannot interpret custom type in a std::vector<> #403

Closed chernals closed 4 years ago

chernals commented 4 years ago

I have a branch storing a class defined with, among other things, the following attribute:

std::vector<BDSOutputROOTEventCollimatorInfo>

Where BDSOutputROOTEventCollimatorInfo is defined with these:

  std::string componentName;
  std::string componentType;
  double      length;
  double      tilt;
  double      offsetX;
  double      offsetY;
  std::string material;
  double      xSizeIn;
  double      ySizeIn;
  double      xSizeOut;
  double      ySizeOut;

As this doesn't involve pointers or deeply nested custom types, I thought this should already work.

The error is

ValueError: cannot interpret branch b'Model.collimatorInfo' as a Python type

The interepretation is None:

m['Model.collimatorInfo'].interpretation is None  # True

I can provide an example root file.

jpivarski commented 4 years ago

Please do provide a file. I suspect the missing piece is std::string, and this type probably has a simple encoding (length + bytes), but I'd have to see it to be sure.

chernals commented 4 years ago

Thanks!

This is then probably related to issue #404 .

https://www.dropbox.com/s/1r2wpwacs52566z/output_collimator.root?dl=0

jpivarski commented 4 years ago

I got confused and thought that #404 was from you, the continuation of this issue. (I didn't realize that I was talking to two different people.) I'll check into yours as well, make a combined PR with their solutions, though I don't know just yet that they're the same issue. It is the case that std::string has been implemented; your problem is not that. However, #404 has something to do with walking up to a base class, and I don't see a base class (::) in your branch's name.

I'll take a look, though.

jpivarski commented 4 years ago

Quite a similar file! I guess you're using the same framework. However, your problem is different—even before putting in the correction for @rtesse, your m['Model.collimatorInfo'].interpretation was not giving me None—I think you had an old version of uproot, from before some other bug was fixed.

To be clear, I'm looking at the TTree named "Model" (@rtesse's TTree was named "Beam").

I find that all of these branches have non-None interpretations, but "Model.collimatorInfo" and "Model.collimatorIndicesByName" seem to be wrong because they don't successfully deserialize the data. I'm looking into that, now.

jpivarski commented 4 years ago

(It's unrelated to #404, despite the files being so similar, because #404 was about not finding an interpretation due to a base class in the Beam TTree; the Model TTree doesn't have a base class in the way.)

jpivarski commented 4 years ago

I don't know how to interpret the Model.collimatorInfo branch. I suspect that it's a custom streamer, not strictly ROOT I/O. We've encountered this once before, in issue #373.

Here's why I don't think it's in ROOT format: we can look at the raw bytes of one entry of the branch by using the uproot.asdebug interpretation.

>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0]
array([ 64,   0,   2, 155,  64,   9,   0,   1,   0,   0,   0,   7,   0,
         1,   0,   0,   0,   0,   2,   0,   0,   0,   0,   1,   0,   0,
         0,   0,   2,   0,   0,   0,   0,   1,   0,   0,   0,   0,   2,
         0,   0,   0,   0,   1,   0,   0,   0,   0,   2,   0,   0,   0,
         0,   1,   0,   0,   0,   0,   2,   0,   0,   0,   0,   1,   0,
         0,   0,   0,   2,   0,   0,   0,   0,   1,   0,   0,   0,   0,
         2,   0,   0,   0,  64,   0,   0,  48,   0,   9,   4,  83,  76,
        49,  69,  11,  87,  73,  78,  68,  79,  87,  95,  69,  88,  84,
        82,   9,  87,  73,  78,  68,  79,  87,  95,  73,  78,   3,  67,
        79,  76,   4,  83,  76,  49,  71,   4,  83,  76,  50,  71,   4,
        83,  76,  51,  71,  64,   0,   0,  37,   0,   9,   4, 114,  99,
       111, 108,   4, 114,  99, 111, 108,   4, 114,  99, 111, 108,   4,
       101,  99, 111, 108,   4, 114,  99, 111, 108,   4, 114,  99, 111,
       108,   4, 114,  99, 111, 108,  63, 174, 184,  81, 235, 133,  30,
       184,  63,  26,  54, 226, 235,  28,  67,  45,  63,  26,  54, 226,
       235,  28,  67,  45,  63, 172,  40, 245, 194, 143,  92,  41,  63,
       174, 184,  81, 235, 133,  30, 184,  63, 174, 184,  81, 235, 133,
        30, 184,  63, 174, 184,  81, 235, 133,  30, 184,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,  64,   0,   0,  44,
         0,   9,   5,  71,  52,  95,  78, 105,   5,  71,  52,  95,  84,
       105,   5,  71,  52,  95,  84, 105,   5,  71,  52,  95,  78, 105,
         5,  71,  52,  95,  78, 105,   5,  71,  52,  95,  78, 105,   5,
        71,  52,  95,  78, 105,  63, 147, 220, 103,  10, 107,  48, 232,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,  63, 116, 122, 225,  71, 174,  20, 123,  63, 156,
        40, 242, 103,  19, 219,  69,  63, 156,  40, 245, 194, 143,  92,
        41,  63, 151,  10,  61,  96, 218, 111, 141,  63, 156,  40, 245,
       194, 143,  92,  41,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,  63, 116, 122, 225,  71, 174,
        20, 123,  63, 156,  40, 245, 194, 143,  92,  41,  63, 137, 110,
       148, 243,  42,  93, 124,  63, 156,  40, 245, 194, 143,  92,  41,
        63, 147, 220, 103,  10, 107,  48, 232,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  63, 116,
       122, 225,  71, 174,  20, 123,  63, 156,  40, 242, 103,  19, 219,
        69,  63, 156,  40, 245, 194, 143,  92,  41,  63, 151,  10,  61,
        96, 218, 111, 141,  63, 156,  40, 245, 194, 143,  92,  41,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,  63, 116, 122, 225,  71, 174,  20, 123,  63, 156,  40,
       245, 194, 143,  92,  41,  63, 137, 110, 148, 243,  42,  93, 124,
        63, 156,  40, 245, 194, 143,  92,  41], dtype=uint8)

Starting from the end and working backward (because I have the most uncertainty in headers; there are never any footers), I find that the last 28 (7 × 4) values are reasonable doubles.

>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][-7*4*8 : ].view(">f8")
array([0.01939546, 0.        , 0.        , 0.005     , 0.02749995,
       0.0275    , 0.0225    , 0.0275    , 0.        , 0.        ,
       0.005     , 0.0275    , 0.01241795, 0.0275    , 0.01939546,
       0.        , 0.        , 0.005     , 0.02749995, 0.0275    ,
       0.0225    , 0.0275    , 0.        , 0.        , 0.005     ,
       0.0275    , 0.01241795, 0.0275    ])

All of these numbers are the same order of magnitude, though they have different exponents—that doesn't happen by accident. They are 28 contiguous doubles.

Right before that, there are 7 strings, which happen to each have 5 characters plus a size byte, so 6 bytes each.

>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][-7*4*8 - 7*6 : -7*4*8].tostring()
b'\x05G4_Ni\x05G4_Ti\x05G4_Ti\x05G4_Ni\x05G4_Ni\x05G4_Ni\x05G4_Ni'

The data up to this point are:

>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][: -7*4*8 - 7*6]
array([ 64,   0,   2, 155,  64,   9,   0,   1,   0,   0,   0,   7,   0,
         1,   0,   0,   0,   0,   2,   0,   0,   0,   0,   1,   0,   0,
         0,   0,   2,   0,   0,   0,   0,   1,   0,   0,   0,   0,   2,
         0,   0,   0,   0,   1,   0,   0,   0,   0,   2,   0,   0,   0,
         0,   1,   0,   0,   0,   0,   2,   0,   0,   0,   0,   1,   0,
         0,   0,   0,   2,   0,   0,   0,   0,   1,   0,   0,   0,   0,
         2,   0,   0,   0,  64,   0,   0,  48,   0,   9,   4,  83,  76,
        49,  69,  11,  87,  73,  78,  68,  79,  87,  95,  69,  88,  84,
        82,   9,  87,  73,  78,  68,  79,  87,  95,  73,  78,   3,  67,
        79,  76,   4,  83,  76,  49,  71,   4,  83,  76,  50,  71,   4,
        83,  76,  51,  71,  64,   0,   0,  37,   0,   9,   4, 114,  99,
       111, 108,   4, 114,  99, 111, 108,   4, 114,  99, 111, 108,   4,
       101,  99, 111, 108,   4, 114,  99, 111, 108,   4, 114,  99, 111,
       108,   4, 114,  99, 111, 108,  63, 174, 184,  81, 235, 133,  30,
       184,  63,  26,  54, 226, 235,  28,  67,  45,  63,  26,  54, 226,
       235,  28,  67,  45,  63, 172,  40, 245, 194, 143,  92,  41,  63,
       174, 184,  81, 235, 133,  30, 184,  63, 174, 184,  81, 235, 133,
        30, 184,  63, 174, 184,  81, 235, 133,  30, 184,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,  64,   0,   0,  44,
         0,   9], dtype=uint8)

The last 6 bytes, 64, 0, 0, 44, 0, 9 is clearly the header for a block. 6 bytes is very common for a header; it's usually an int and a short for fProcessID and fBits.

Okay, so far so good. There's a block of stuff, a bunch of zeros, and then another block of stuff, and the second block is clearly header + 7 strings, which each say G4_Ni (a material) + 7 × 4 doubles.

The streamer for this type is:

>>> f._context.streamerinfosmap[b"BDSOutputROOTEventCollimatorInfo"].show()
StreamerInfo for class: BDSOutputROOTEventCollimatorInfo, version=1, checksum=0x87ac4c05
  TObject         BASE            offset=  0 type=66 Basic ROOT object
  componentName   string          offset=  0 type=500 
  componentType   string          offset=  0 type=500 
  length          double          offset=  0 type= 8 
  tilt            double          offset=  0 type= 8 
  offsetX         double          offset=  0 type= 8 
  offsetY         double          offset=  0 type= 8 
  material        string          offset=  0 type=500 
  xSizeIn         double          offset=  0 type= 8 
  ySizeIn         double          offset=  0 type= 8 
  xSizeOut        double          offset=  0 type= 8 
  ySizeOut        double          offset=  0 type= 8 

and here's the problem: there should be 8 doubles and the material string should be between the first 4 and the last 4. I'm seeing all of those values in there, but they're not in the order the streamer says they should be. This happened in issue #373 because that file (produced by FAIRoot) used Boost serialization inside the ROOT TTree. That Boost serialization put numbers and strings contiguously, but ROOT serialization of a non-split branch does not do that. It goes in the order of the streamer—and the streamer is our only guide for what order to expect things in.

If it's not following the streamer, then the C++ code that wrote this out is probably using a "custom streamer," possibly based on Boost but not necessarily. (A "custom streamer" is when C++ code writes its objects any way it wants, in a custom function, and the streamer object saved in the file is not meaningful. The only way to decode it is to run the custom C++ code.)

If your framework uses custom streamers, I can't help you for that branch. Sorry!

chernals commented 4 years ago

Wow, interesting!

Issue #404 was not from me, but he's in my group, we are all using BDSIM. Sorry for the confusion.

Regarding the custom streamer or the use of Boost, I will investigate, but so far I didn't find anything specific to that class.

jpivarski commented 4 years ago

@chernals Go ahead and reopen this issue if there's some new information I should consider. I'm just trying to get my head around which of these are still active, and as I last understood this one, we're lacking correct guidance from the streamer about how this branch is actually serialized.

chernals commented 4 years ago

At this stage I cannot say more, I looked in the code but I don't see why the streamer for this case would be wrong. However this is very low priority for us, so I suggest to let it go for now.

jpivarski commented 4 years ago

As it turns out, the error above is because I was unaware of ROOT's "memberwise splitting," and (if I said anything to the contrary above), it has nothing to do with Boost serialization. This same error came up in 6 different issues, so further discussion on it will be consolidated into scikit-hep/uproot4#38. (This comment is a form message I'm writing on all 6 issues.)

As of PR scikit-hep/uproot4#87, we can now detect such cases, so at least we'll raise a NotImplementedError instead of letting the deserializer fail in mysterious ways. Someday, it will actually be implemented (watch scikit-hep/uproot4#38), but in the meantime, the thing you can do is write your data "objectwise," not "memberwise." (See this comment for ideas on how to do that, and if you manage to do it, you can help a lot of people out by sharing a recipe.)