Closed chernals closed 4 years ago
Please do provide a file. I suspect the missing piece is std::string
, and this type probably has a simple encoding (length + bytes), but I'd have to see it to be sure.
Thanks!
This is then probably related to issue #404 .
https://www.dropbox.com/s/1r2wpwacs52566z/output_collimator.root?dl=0
I got confused and thought that #404 was from you, the continuation of this issue. (I didn't realize that I was talking to two different people.) I'll check into yours as well, make a combined PR with their solutions, though I don't know just yet that they're the same issue. It is the case that std::string
has been implemented; your problem is not that. However, #404 has something to do with walking up to a base class, and I don't see a base class (::
) in your branch's name.
I'll take a look, though.
Quite a similar file! I guess you're using the same framework. However, your problem is different—even before putting in the correction for @rtesse, your m['Model.collimatorInfo'].interpretation
was not giving me None
—I think you had an old version of uproot, from before some other bug was fixed.
To be clear, I'm looking at the TTree named "Model"
(@rtesse's TTree was named "Beam"
).
I find that all of these branches have non-None
interpretations, but "Model.collimatorInfo"
and "Model.collimatorIndicesByName"
seem to be wrong because they don't successfully deserialize the data. I'm looking into that, now.
(It's unrelated to #404, despite the files being so similar, because #404 was about not finding an interpretation due to a base class in the Beam
TTree; the Model
TTree doesn't have a base class in the way.)
I don't know how to interpret the Model.collimatorInfo
branch. I suspect that it's a custom streamer, not strictly ROOT I/O. We've encountered this once before, in issue #373.
Here's why I don't think it's in ROOT format: we can look at the raw bytes of one entry of the branch by using the uproot.asdebug
interpretation.
>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0]
array([ 64, 0, 2, 155, 64, 9, 0, 1, 0, 0, 0, 7, 0,
1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0,
0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2,
0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0,
0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0,
0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0,
2, 0, 0, 0, 64, 0, 0, 48, 0, 9, 4, 83, 76,
49, 69, 11, 87, 73, 78, 68, 79, 87, 95, 69, 88, 84,
82, 9, 87, 73, 78, 68, 79, 87, 95, 73, 78, 3, 67,
79, 76, 4, 83, 76, 49, 71, 4, 83, 76, 50, 71, 4,
83, 76, 51, 71, 64, 0, 0, 37, 0, 9, 4, 114, 99,
111, 108, 4, 114, 99, 111, 108, 4, 114, 99, 111, 108, 4,
101, 99, 111, 108, 4, 114, 99, 111, 108, 4, 114, 99, 111,
108, 4, 114, 99, 111, 108, 63, 174, 184, 81, 235, 133, 30,
184, 63, 26, 54, 226, 235, 28, 67, 45, 63, 26, 54, 226,
235, 28, 67, 45, 63, 172, 40, 245, 194, 143, 92, 41, 63,
174, 184, 81, 235, 133, 30, 184, 63, 174, 184, 81, 235, 133,
30, 184, 63, 174, 184, 81, 235, 133, 30, 184, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 44,
0, 9, 5, 71, 52, 95, 78, 105, 5, 71, 52, 95, 84,
105, 5, 71, 52, 95, 84, 105, 5, 71, 52, 95, 78, 105,
5, 71, 52, 95, 78, 105, 5, 71, 52, 95, 78, 105, 5,
71, 52, 95, 78, 105, 63, 147, 220, 103, 10, 107, 48, 232,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 63, 116, 122, 225, 71, 174, 20, 123, 63, 156,
40, 242, 103, 19, 219, 69, 63, 156, 40, 245, 194, 143, 92,
41, 63, 151, 10, 61, 96, 218, 111, 141, 63, 156, 40, 245,
194, 143, 92, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 63, 116, 122, 225, 71, 174,
20, 123, 63, 156, 40, 245, 194, 143, 92, 41, 63, 137, 110,
148, 243, 42, 93, 124, 63, 156, 40, 245, 194, 143, 92, 41,
63, 147, 220, 103, 10, 107, 48, 232, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 116,
122, 225, 71, 174, 20, 123, 63, 156, 40, 242, 103, 19, 219,
69, 63, 156, 40, 245, 194, 143, 92, 41, 63, 151, 10, 61,
96, 218, 111, 141, 63, 156, 40, 245, 194, 143, 92, 41, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 63, 116, 122, 225, 71, 174, 20, 123, 63, 156, 40,
245, 194, 143, 92, 41, 63, 137, 110, 148, 243, 42, 93, 124,
63, 156, 40, 245, 194, 143, 92, 41], dtype=uint8)
Starting from the end and working backward (because I have the most uncertainty in headers; there are never any footers), I find that the last 28 (7 × 4) values are reasonable doubles
.
>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][-7*4*8 : ].view(">f8")
array([0.01939546, 0. , 0. , 0.005 , 0.02749995,
0.0275 , 0.0225 , 0.0275 , 0. , 0. ,
0.005 , 0.0275 , 0.01241795, 0.0275 , 0.01939546,
0. , 0. , 0.005 , 0.02749995, 0.0275 ,
0.0225 , 0.0275 , 0. , 0. , 0.005 ,
0.0275 , 0.01241795, 0.0275 ])
All of these numbers are the same order of magnitude, though they have different exponents—that doesn't happen by accident. They are 28 contiguous doubles
.
Right before that, there are 7 strings, which happen to each have 5 characters plus a size byte, so 6 bytes each.
>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][-7*4*8 - 7*6 : -7*4*8].tostring()
b'\x05G4_Ni\x05G4_Ti\x05G4_Ti\x05G4_Ni\x05G4_Ni\x05G4_Ni\x05G4_Ni'
The data up to this point are:
>>> t["Model.collimatorInfo"].array(uproot.asdebug)[0][: -7*4*8 - 7*6]
array([ 64, 0, 2, 155, 64, 9, 0, 1, 0, 0, 0, 7, 0,
1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0,
0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2,
0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0,
0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0,
0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0,
2, 0, 0, 0, 64, 0, 0, 48, 0, 9, 4, 83, 76,
49, 69, 11, 87, 73, 78, 68, 79, 87, 95, 69, 88, 84,
82, 9, 87, 73, 78, 68, 79, 87, 95, 73, 78, 3, 67,
79, 76, 4, 83, 76, 49, 71, 4, 83, 76, 50, 71, 4,
83, 76, 51, 71, 64, 0, 0, 37, 0, 9, 4, 114, 99,
111, 108, 4, 114, 99, 111, 108, 4, 114, 99, 111, 108, 4,
101, 99, 111, 108, 4, 114, 99, 111, 108, 4, 114, 99, 111,
108, 4, 114, 99, 111, 108, 63, 174, 184, 81, 235, 133, 30,
184, 63, 26, 54, 226, 235, 28, 67, 45, 63, 26, 54, 226,
235, 28, 67, 45, 63, 172, 40, 245, 194, 143, 92, 41, 63,
174, 184, 81, 235, 133, 30, 184, 63, 174, 184, 81, 235, 133,
30, 184, 63, 174, 184, 81, 235, 133, 30, 184, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 44,
0, 9], dtype=uint8)
The last 6 bytes, 64, 0, 0, 44, 0, 9
is clearly the header for a block. 6 bytes is very common for a header; it's usually an int
and a short
for fProcessID
and fBits
.
Okay, so far so good. There's a block of stuff, a bunch of zeros, and then another block of stuff, and the second block is clearly header + 7 strings, which each say G4_Ni
(a material) + 7 × 4 doubles
.
The streamer for this type is:
>>> f._context.streamerinfosmap[b"BDSOutputROOTEventCollimatorInfo"].show()
StreamerInfo for class: BDSOutputROOTEventCollimatorInfo, version=1, checksum=0x87ac4c05
TObject BASE offset= 0 type=66 Basic ROOT object
componentName string offset= 0 type=500
componentType string offset= 0 type=500
length double offset= 0 type= 8
tilt double offset= 0 type= 8
offsetX double offset= 0 type= 8
offsetY double offset= 0 type= 8
material string offset= 0 type=500
xSizeIn double offset= 0 type= 8
ySizeIn double offset= 0 type= 8
xSizeOut double offset= 0 type= 8
ySizeOut double offset= 0 type= 8
and here's the problem: there should be 8 doubles
and the material string should be between the first 4 and the last 4. I'm seeing all of those values in there, but they're not in the order the streamer says they should be. This happened in issue #373 because that file (produced by FAIRoot) used Boost serialization inside the ROOT TTree. That Boost serialization put numbers and strings contiguously, but ROOT serialization of a non-split branch does not do that. It goes in the order of the streamer—and the streamer is our only guide for what order to expect things in.
If it's not following the streamer, then the C++ code that wrote this out is probably using a "custom streamer," possibly based on Boost but not necessarily. (A "custom streamer" is when C++ code writes its objects any way it wants, in a custom function, and the streamer object saved in the file is not meaningful. The only way to decode it is to run the custom C++ code.)
If your framework uses custom streamers, I can't help you for that branch. Sorry!
Wow, interesting!
Issue #404 was not from me, but he's in my group, we are all using BDSIM. Sorry for the confusion.
Regarding the custom streamer or the use of Boost, I will investigate, but so far I didn't find anything specific to that class.
@chernals Go ahead and reopen this issue if there's some new information I should consider. I'm just trying to get my head around which of these are still active, and as I last understood this one, we're lacking correct guidance from the streamer about how this branch is actually serialized.
At this stage I cannot say more, I looked in the code but I don't see why the streamer for this case would be wrong. However this is very low priority for us, so I suggest to let it go for now.
As it turns out, the error above is because I was unaware of ROOT's "memberwise splitting," and (if I said anything to the contrary above), it has nothing to do with Boost serialization. This same error came up in 6 different issues, so further discussion on it will be consolidated into scikit-hep/uproot4#38. (This comment is a form message I'm writing on all 6 issues.)
As of PR scikit-hep/uproot4#87, we can now detect such cases, so at least we'll raise a NotImplementedError
instead of letting the deserializer fail in mysterious ways. Someday, it will actually be implemented (watch scikit-hep/uproot4#38), but in the meantime, the thing you can do is write your data "objectwise," not "memberwise." (See this comment for ideas on how to do that, and if you manage to do it, you can help a lot of people out by sharing a recipe.)
I have a branch storing a class defined with, among other things, the following attribute:
Where
BDSOutputROOTEventCollimatorInfo
is defined with these:As this doesn't involve pointers or deeply nested custom types, I thought this should already work.
The error is
The interepretation is
None
:I can provide an example root file.