scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
239 stars 76 forks source link

Handle ROOT's memberwise splitting #38

Open jpivarski opened 4 years ago

jpivarski commented 4 years ago

Consider the weird serialization in scikit-hep/uproot#373, scikit-hep/uproot#374, scikit-hep/uproot#403, scikit-hep/uproot#475, and scikit-hep/uproot#495. It's field-at-a-time inside of each entry. I had thought it was Boost-inside-ROOT, but not for most of the above. It may be some ROOT serialization mode that I'm unaware of.

jpivarski commented 4 years ago

Add to that one more: I think scikit-hep/uproot#510 is another example of that. Look at the second file, uproot-issue510b.root:

>>> import uproot4, skhep_testdata
>>> t = uproot4.open(skhep_testdata.data_path("uproot-issue510.root"))["EDepSimEvents"]
>>> b = t["Event"]["Trajectories.Points"]
>>> b.debug(0)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0 101 102  64   9   0   1   0   0   0   2   0   1   0   0   0   0   2   0
  @ ---   e   f   @ --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   1   0   0   0   0   2   0   0   0  64   0   0  60   0   4   0   1
--- --- --- --- --- --- --- --- --- --- --- ---   @ --- ---   < --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   2   0   0   0  64   0   0  36   0   3   0   1   0   0   0   0
--- --- --- --- --- --- --- ---   @ --- ---   $ --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  2   0   0   0  64 103 237  14  20  44 204 192  64  99 170 169 116  55  10  48
--- --- --- ---   @   g --- --- ---   , --- ---   @   c --- ---   t   7 ---   0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64 192  63 130 249   6 230 103  63 240   0   0   0   0   0   0  64   0   0  60
  @ ---   ? --- --- --- ---   g   ? --- --- --- --- --- --- ---   @ --- ---   <
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   4   0   1   0   0   0   0   2   0   0   0  64   0   0  36   0   3   0   1
--- --- --- --- --- --- --- --- --- --- --- ---   @ --- ---   $ --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   2   0   0   0  64 104 149  31 100 192  97 100  64  98 140 241
--- --- --- --- --- --- --- ---   @   h --- ---   d ---   a   d   @   b --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
140  93 110   7  64 192  67  18 202 151 200 123  63 240 171 196  70 133 147  27
---   ]   n ---   @ ---   C --- --- --- ---   {   ? --- --- ---   F --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  36   0   3   0   1   0   0   0   0   2   0   0   0  64   4 167 135
  @ --- ---   $ --- --- --- --- --- --- --- --- --- --- --- ---   @ --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
146 180 211 210 192  17 142 116  33 185 246 118  64  12   3 158  90 174 184  82
--- --- --- --- --- --- ---   t   ! --- ---   v   @ --- --- ---   Z --- ---   R
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  36   0   3   0   1   0   0   0   0   2   0   0   0   0   0   0   0
  @ --- ---   $ --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0 128   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   0   0   0   2
--- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+

This is a collection of std::vector<TG4TrajectoryPoint>, where TG4TrajectoryPoint is

>>> tree.file.streamer_named("TG4TrajectoryPoint").show()
TG4TrajectoryPoint (v1): TObject (v1)
    Position: TLorentzVector (TStreamerObject)
    Momentum: TVector3 (TStreamerObject)
    Process: int (TStreamerBasicType)
    Subprocess: int (TStreamerBasicType)

The first 6 bytes is header as usual: 64 0 101 102 64 9. (That's the right num_bytes for the entry.)

Next, we're looking at a split std::vector header:

| 0   1 | 0   0   0   2 | 0   1 | 0   0   0   0 | 2   0   0   0 | 0   1 | 0   0   0   0 | 2   0   0   0 |
|       |  two objects  |            bits for #1                |            bits for #2                |

Then follow two TLorentzVectors:

[191.4079686045643, 157.33318529844246, 8319.023224699498, 1.0]
[196.6600822217398, 148.4044858765412, 8326.14680764474, 1.0419352297551032]

and two TVector3:

[2.5818015538629675, -4.389114882445588, 3.5017668804712594]
[0.0, -0.0, 0.0]

and two integers, 0 and 2.

Following that is a header for 0 objects and then a header for 32 objects:

--+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   0   0   0  14   0   0   0  32
--- --- --- --- --- --- --- --- --- --- ---    
--+---+---+---+---+---+---+---+---+---+---+---+-

and, indeed, there are 32 ten-byte std::vector headers:

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   1   0   0   0   0   2   0   0   0   0   1   0   0   0   0   2   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-

Right after that, the TLorentzVectors start up again:

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0  60   0   4   0   1   0   0   0   0   2   0   0   0  64   0   0  36
  @ --- ---   < --- --- --- --- --- --- --- --- --- --- --- ---   @ --- ---   $
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   3   0   1   0   0   0   0   2   0   0   0  64 103 237  14  20  44 204 192
--- --- --- --- --- --- --- --- --- --- --- ---   @   g --- --- ---   , --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64  99 170 169 116  55  10  48  64 192  63 130 249   6 230 103  63 240   0   0
  @   c --- ---   t   7 ---   0   @ ---   ? --- --- --- ---   g   ? --- --- ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0
--- --- --- ---
--+---+---+---+

This one is

[191.4079686045643, 157.33318529844246, 8319.023224699498, 1.0]

Similarly, there's also a "name" field that claims to have type std::string:

>>> t["Event"]["Trajectories.Name"].streamer
<TStreamerSTLstring at 0x7f33475eaf10>

but it's clearly a collection of strings (53 of them):

>>> t["Event"]["Trajectories.Name"].debug(0)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64   0   0 214   0   9   5 103  97 109 109  97   3 109 117  45   6 112 114 111
  @ --- --- --- --- --- ---   g   a   m   m   a ---   m   u   - ---   p   r   o
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
116 111 110   6 112 114 111 116 111 110   6 112 114 111 116 111 110   6 112 114
  t   o   n ---   p   r   o   t   o   n ---   p   r   o   t   o   n ---   p   r
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
111 116 111 110   6 112 114 111 116 111 110   6 112 114 111 116 111 110   7 110
  o   t   o   n ---   p   r   o   t   o   n ---   p   r   o   t   o   n ---   n
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
101 117 116 114 111 110   7 110 101 117 116 114 111 110   7 110 101 117 116 114
  e   u   t   r   o   n ---   n   e   u   t   r   o   n ---   n   e   u   t   r
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
111 110   7 110 101 117 116 114 111 110   2 101  45   2 101  45   2 101  45   2
  o   n ---   n   e   u   t   r   o   n ---   e   - ---   e   - ---   e   - ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45
  e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   -
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101
---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2
  - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45
  e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   - ---   e   -
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  2 101  45   2 101  45   2 101  45   2 101  45   2 101  43   7 110 101 117 116
---   e   - ---   e   - ---   e   - ---   e   - ---   e   + ---   n   e   u   t
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
114 111 110   2 101  45   2 101  45   2 101  45   2 101  45   2 101  45
  r   o   n ---   e   - ---   e   - ---   e   - ---   e   - ---   e   -

The key: I think these are both TClonesArrays! They have a non-empty fTClonesName member.

jpivarski commented 4 years ago

@tamasgal, I know that you're busy with Unroot.jl, but if you're ever interested in solving a mystery, we now have 6 issues that are due to cases in which ROOT writes the subentries 1, 2, 3 of structs with fields a b as

a1 a2 a3 b1 b2 b3

instead of

a1 b1 a2 b2 a3 b3

It's a lot like branch-splitting, but this happens inside of each entry. I've been able to reverse engineer the fact that this is happening, but not why it's happening: what information in the TBranch(Element), its parent branches, maybe the TTree itself, and associated streamers might tell us that we should deserialize it this way instead of the normal way. If you find anything that might shed some light on it (or even what this mode is named!), I'd be grateful.

The cases in this function:

https://github.com/scikit-hep/uproot4/blob/ccf790f9bb4c25363fdc9fabe455dd99c85d69d8/uproot4/interpretation/identify.py#L1180-L1186

are guesses, based on a few examples, so don't copy them without care!

Also, if you know of anyone else who's inclined to dig into these details, let me know. I'm looking for help!

tamasgal commented 4 years ago

Oh, that seems to be a tough one. I bookmarked it and try to see if I find something new! Currently I am more busy with my PhD than anything else though, but I am certainly interested :sweat_smile:

I think I might have seen something similar in KM3NeT data, I have to dig in my notes, I hope I find it since for that I could also provide the source code...

jpivarski commented 4 years ago

I just learned from Philippe that it's called "memberwise streaming" (as opposed to "objectwise streaming") and the ROOT code for deserializing these objects is here:

https://github.com/root-project/root/blob/e87a6311278f859ca749b491af4e9a2caed39161/io/io/src/TStreamerInfoReadBuffer.cxx#L1220-L1277

The way to identify that a particular object is serialized this way is by checking for bit 14 (2**14 == 16384) in the instance version, TBufferFile::kStreamedMemberWise. Also, that bit has to be removed from the instance version before comparing it with the class version (second line of the quoted code above).

Indeed, in the uproot-issue510b.root file I investigated above, the version number does have bit 14 set:

>>> import uproot4, skhep_testdata
>>> t = uproot4.open(skhep_testdata.data_path("uproot-issue510b.root"))["EDepSimEvents"]
>>> b = t["Event"]["Trajectories.Points"]
>>> b.debug(0, limit_bytes=6)
--+---+---+---+---+---+
 64   0 101 102  64   9
  @ ---   e   f   @ ---
--+---+---+---+---+---+

The first four bytes is the size of this entry (with the kByteCountMask bit removed),

>>> numpy.array([64, 0, 101, 102], "u1").view(">u4") & ~(2**30)
array([25958])

and the next two bytes is the version number with a kStreamedMemberWise bit set,

>>> numpy.array([64, 9], "u1").view(">u2") & ~(2**14)
array([9], dtype=int32)

So these things are identified one object at a time (even though a branch is likely to consist entirely of one type of serialization or the other).

For making tests, I think the way a class can be put into this mode is by calling TClass::SetCanSplit(true) on its TClass object (TClass::GetClass("class name")). I'm not 100% certain whether this controls memberwise/objectwise splitting, ordinary branch splitting, or both. But it would be nice to see the same class written as memberwise and as objectwise, for confidence that we're doing it right.

root/test/bench.cxx might make examples with and without memberwise splitting, but this is part of ROOT's benchmark tests and relies on other code that I haven't followed to its definitions. It might be possible to simply run this benchmark to generate files with memberwise and objectwise serialization.

TVirtualStreamerInfo has a SetStreamMemberWise(bool) method, but I don't know if that means we can directly use it to make tests.

I'm just writing these things here as notes, so that this information does not get lost.

jpivarski commented 4 years ago

This should be considered a bug at least until we have a "not implemented" error message for this case, but fully implementing it is a feature. I think I'll put in one PR to add the "not implemented" message and then remove the "bug" label from this issue.

tamasgal commented 4 years ago

Oh wow, I didn't even have a chance 😅

Nice to hear that the mystery is mostly solved.

You most certainly also found this thread (also from Philippe) https://root-forum.cern.ch/t/splitability-of-classes-with-custom-streamer/32974 for me especially this statement from Philippe was quite new:

If a class has a custom Streamer we have to assume that it is for a good reason :). When splitting is used, the custom Streamer is not used at all and thus we are (silently) not doing what the user (likely) intended often leading to corrupted results.

Btw. I searched our codebase for SetCanSplit but have not found any use of that. I also have not found my notes which were about a strange split structure just like you described, only some sketches of the split-branch strategy which is well-known.

jpivarski commented 4 years ago

I just asked him about it at the ROOT I/O meeting, which is every Friday:

https://indico.cern.ch/category/526/

tamasgal commented 4 years ago

Oh nice, it seems to be open for externals, at least I was able to join the video room ;)

kratsg commented 3 years ago

So nicely, #209 has a TEfficiency which looks like a good vector of attack (pun intended) for memberwise serialization. This can be easily made like so

import ROOT

fp = ROOT.TFile.Open("test-efficiency.root", "RECREATE")

nbins = 11

h_den = ROOT.TH1F('h_den', 'h_den', nbins, 0, 100)
h_num = ROOT.TH1F('h_num', 'h_num', nbins, 0, 100)

for i in range(1, nbins):
    h_num.SetBinContent(i, 2**i)
    h_den.SetBinContent(i, 2**(i+1))

eff = ROOT.TEfficiency(h_num, h_den)
eff.SetName('TEfficiencyName')
eff.SetTitle('TEfficiencyTitle')

h_den.Write()
h_num.Write()
eff.Write()
fp.Close()

to get a small ROOT file to play around with. This tefficiency does indicate a crash when doing

with uproot.open('test-efficiency.root') as fp:
    eff = fp['TEfficiencyName']

like so

Traceback (most recent call last):
  File "run.py", line 12, in <module>
    tree = fp['TEfficiencyName']
  File "/Users/kratsg/uproot4/uproot/reading.py", line 1979, in __getitem__
    return self.key(where).get()
  File "/Users/kratsg/uproot4/uproot/reading.py", line 2364, in get
    out = cls.read(chunk, cursor, context, self._file, selffile, parent)
  File "/Users/kratsg/uproot4/uproot/model.py", line 1181, in read
    versioned_cls.read(
  File "/Users/kratsg/uproot4/uproot/model.py", line 800, in read
    self.read_members(chunk, cursor, context, file)
  File "<dynamic>", line 12, in read_members
  File "/Users/kratsg/uproot4/uproot/containers.py", line 798, in read
    raise NotImplementedError(
NotImplementedError: memberwise serialization of AsVector
in file test-efficiency.root

so we can start here. This is also quick to iterate and make new ROOT files with different values to determine that we have the right offsets.

The reason for the SetName and SetTitle is to match the file in #209 that has the issue. So at least it's just trying to match the structure there as much as possible.

kratsg commented 3 years ago

Here's the full sequence of that TEfficiency with the following histograms stored:

$ cat tefficiency.py 
import ROOT

#fp = ROOT.TFile.Open('uproot4-issue209.root')
fp = ROOT.TFile.Open('test-efficiency.root')
eff = fp.TEfficiencyName

num = eff.GetPassedHistogram()
den = eff.GetTotalHistogram()

print([num.GetBinContent(i) for i in range(len(num)+1)])
print([den.GetBinContent(i) for i in range(len(den)+1)])

$ roopython3 tefficiency.py 
[0.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0, 1024.0, 0.0, 0.0, 0.0]
[0.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0, 1024.0, 2048.0, 0.0, 0.0, 0.0]

as below.

Click to expand the full byte content of TEfficiency. ``` AsVector::num_bytes= 16 AsVector::instance_version= 9 AsVector::is_memberwise= 0b100_0000_0000_0000 (16384) --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 215 190 210 0 0 0 0 63 229 216 151 162 65 163 245 64 0 --- --- --- --- --- --- --- --- --- --- ? --- --- --- --- A --- --- @ --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 17 0 5 0 1 0 0 0 0 3 0 0 0 0 0 0 0 0 64 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- @ --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 2 120 255 255 255 255 84 72 49 70 0 64 0 2 107 0 3 64 0 --- --- x --- --- --- --- T H 1 F --- @ --- --- k --- --- @ --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 2 45 0 8 64 0 0 61 0 1 0 1 0 0 0 0 3 0 0 8 --- - --- --- @ --- --- = --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 22 84 69 102 102 105 99 105 101 110 99 121 78 97 109 101 95 112 97 115 --- T E f f i c i e n c y N a m e _ p a s --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 115 101 100 25 84 69 102 102 105 99 105 101 110 99 121 84 105 116 108 101 s e d --- T E f f i c i e n c y T i t l e --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 32 40 112 97 115 115 101 100 41 64 0 0 8 0 2 2 90 0 1 0 ( p a s s e d ) @ --- --- --- --- --- --- Z --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 1 64 0 0 6 0 2 0 0 3 233 64 0 0 10 0 2 0 1 0 --- @ --- --- --- --- --- --- --- --- --- @ --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 1 63 128 0 0 0 0 0 13 64 0 0 109 0 10 64 0 0 19 0 --- ? --- --- --- --- --- --- --- @ --- --- m --- --- @ --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 1 0 1 0 0 0 0 3 0 0 0 5 120 97 120 105 115 0 64 0 --- --- --- --- --- --- --- --- --- --- --- --- x a x i s --- @ --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 36 0 4 0 0 1 254 0 1 0 1 0 42 59 163 215 10 61 15 --- $ --- --- --- --- --- --- --- --- --- --- --- * ; --- --- --- = --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 92 41 60 245 194 143 63 128 0 0 61 15 92 41 0 1 0 42 0 0 \ ) < --- --- --- ? --- --- --- = --- \ ) --- --- --- * --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 11 0 0 0 0 0 0 0 0 64 89 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- @ Y --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 64 0 0 109 0 10 64 0 0 19 0 1 0 1 0 0 0 0 --- --- @ --- --- m --- --- @ --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 3 0 0 0 5 121 97 120 105 115 0 64 0 0 36 0 4 0 0 1 --- --- --- --- --- y a x i s --- @ --- --- $ --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 254 0 1 0 1 0 42 59 163 215 10 61 15 92 41 60 245 194 143 0 --- --- --- --- --- --- * ; --- --- --- = --- \ ) < --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 61 15 92 41 0 1 0 42 0 0 0 1 0 0 0 0 0 --- --- --- = --- \ ) --- --- --- * --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 63 240 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- ? --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 109 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- @ --- --- m --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 10 64 0 0 19 0 1 0 1 0 0 0 0 3 0 0 0 5 122 97 --- @ --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- z a --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 120 105 115 0 64 0 0 36 0 4 0 0 1 254 0 1 0 1 0 42 x i s --- @ --- --- $ --- --- --- --- --- --- --- --- --- --- --- * --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 59 163 215 10 61 15 92 41 60 245 194 143 63 128 0 0 61 15 92 41 ; --- --- --- = --- \ ) < --- --- --- ? --- --- --- = --- \ ) --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 1 0 42 0 0 0 1 0 0 0 0 0 0 0 0 63 240 0 0 --- --- --- * --- --- --- --- --- --- --- --- --- --- --- --- ? --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 3 232 64 36 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- @ $ --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 192 145 92 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- \ --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 192 145 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- \ --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 64 0 0 17 0 5 0 1 0 0 0 0 3 1 0 --- --- --- --- --- @ --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 13 0 0 0 0 64 0 0 0 64 128 0 0 65 0 0 0 65 --- --- --- --- --- --- --- @ --- --- --- @ --- --- --- A --- --- --- A --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 128 0 0 66 0 0 0 66 128 0 0 67 0 0 0 67 128 0 0 68 --- --- --- B --- --- --- B --- --- --- C --- --- --- C --- --- --- D --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 68 128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 --- --- --- D --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- @ --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 2 113 128 0 0 229 64 0 2 105 0 3 64 0 2 43 0 8 64 --- --- q --- --- --- --- @ --- --- i --- --- @ --- --- + --- --- @ --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 59 0 1 0 1 0 0 0 0 3 0 0 8 21 84 69 102 102 --- --- ; --- --- --- --- --- --- --- --- --- --- --- --- --- T E f f --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 105 99 105 101 110 99 121 78 97 109 101 95 116 111 116 97 108 24 84 69 i c i e n c y N a m e _ t o t a l --- T E --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 102 102 105 99 105 101 110 99 121 84 105 116 108 101 32 40 116 111 116 97 f f i c i e n c y T i t l e ( t o t a --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 108 41 64 0 0 8 0 2 2 90 0 1 0 1 64 0 0 6 0 2 l ) @ --- --- --- --- --- --- Z --- --- --- --- @ --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 3 233 64 0 0 10 0 2 0 1 0 1 63 128 0 0 0 0 --- --- --- --- @ --- --- --- --- --- --- --- --- --- ? --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 13 64 0 0 109 0 10 64 0 0 19 0 1 0 1 0 0 0 0 --- --- @ --- --- m --- --- @ --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 3 0 0 0 5 120 97 120 105 115 0 64 0 0 36 0 4 0 0 1 --- --- --- --- --- x a x i s --- @ --- --- $ --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 254 0 1 0 1 0 42 59 163 215 10 61 15 92 41 60 245 194 143 63 --- --- --- --- --- --- * ; --- --- --- = --- \ ) < --- --- --- ? --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 128 0 0 61 15 92 41 0 1 0 42 0 0 0 11 0 0 0 0 0 --- --- --- = --- \ ) --- --- --- * --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 64 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- @ Y --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 109 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- @ --- --- m --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 10 64 0 0 19 0 1 0 1 0 0 0 0 3 0 0 0 5 121 97 --- @ --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- y a --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 120 105 115 0 64 0 0 36 0 4 0 0 1 254 0 1 0 1 0 42 x i s --- @ --- --- $ --- --- --- --- --- --- --- --- --- --- --- * --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 59 163 215 10 61 15 92 41 60 245 194 143 0 0 0 0 61 15 92 41 ; --- --- --- = --- \ ) < --- --- --- --- --- --- --- = --- \ ) --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 1 0 42 0 0 0 1 0 0 0 0 0 0 0 0 63 240 0 0 --- --- --- * --- --- --- --- --- --- --- --- --- --- --- --- ? --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 64 0 0 109 0 10 64 0 0 19 0 1 --- --- --- --- --- --- --- --- @ --- --- m --- --- @ --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 1 0 0 0 0 3 0 0 0 5 122 97 120 105 115 0 64 0 0 --- --- --- --- --- --- --- --- --- --- --- z a x i s --- @ --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 36 0 4 0 0 1 254 0 1 0 1 0 42 59 163 215 10 61 15 92 $ --- --- --- --- --- --- --- --- --- --- --- * ; --- --- --- = --- \ --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 41 60 245 194 143 63 128 0 0 61 15 92 41 0 1 0 42 0 0 0 ) < --- --- --- ? --- --- --- = --- \ ) --- --- --- * --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 1 0 0 0 0 0 0 0 0 63 240 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- ? --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 3 232 64 36 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- @ $ --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 192 145 92 0 0 0 0 0 192 145 92 0 0 0 0 --- --- --- --- --- --- --- \ --- --- --- --- --- --- --- \ --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- @ --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 17 0 5 0 1 0 0 0 0 3 1 0 0 0 0 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 13 0 0 0 0 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 64 128 0 0 65 0 0 0 65 128 0 0 66 0 0 0 66 128 0 0 @ --- --- --- A --- --- --- A --- --- --- B --- --- --- B --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 67 0 0 0 67 128 0 0 68 0 0 0 68 128 0 0 69 0 0 0 C --- --- --- C --- --- --- D --- --- --- D --- --- --- E --- --- --- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- 0 0 0 0 0 0 0 0 63 240 0 0 0 0 0 0 --- --- --- --- --- --- --- --- ? --- --- --- --- --- --- --- ```

the numerator is at 603 and then +52 bytes for the full array contents

# offset=623, dtype=">f4"

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0  13   0   0   0   0  64   0   0   0  64 128   0   0  65   0   0   0  65
--- --- --- --- --- --- ---   @ --- --- ---   @ --- --- ---   A --- --- ---   A
                        0.0             2.0             4.0             8.0
    --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
    128   0   0  66   0   0   0  66 128   0   0  67   0   0   0  67 128   0   0  68
    --- --- ---   B --- --- ---   B --- --- ---   C --- --- ---   C --- --- ---   D
           16.0            32.0            64.0           128.0           256.0
    --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
      0   0   0  68 128   0   0   0   0   0   0   0   0   0   0   0   0   0   0  64
    --- --- ---   D --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---   @
          512.0          1024.0             0.0             0.0             0.0
    --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-

and the denominator is at 1256 and then +52 bytes for the full array contents

# offset=1256, dtype=">f4"

--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   0   0   0   0   0   0   0   2   0   0   0  13   0   0   0   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                            0.0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 64 128   0   0  65   0   0   0  65 128   0   0  66   0   0   0  66 128   0   0
  @ --- --- ---   A --- --- ---   A --- --- ---   B --- --- ---   B --- --- ---
            4.0             8.0            16.0            32.0            64.0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 67   0   0   0  67 128   0   0  68   0   0   0  68 128   0   0  69   0   0   0
  C --- --- ---   C --- --- ---   D --- --- ---   D --- --- ---   E --- --- ---
          128.0           256.0           512.0          1024.0          2048.0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0   0   0   0   0   0  63 240   0   0   0   0   0   0
--- --- --- --- --- --- --- ---   ? --- --- --- --- --- --- ---
            0.0             0.0           1.875             0.0
kratsg commented 3 years ago

Continuing further using this python, it seems to be that the memberwise is complaining about the AsVector::read call. So if we look at the streamer for TEfficiency here and the corresponding class code

>>> fp.file.streamer_named('TEfficiency').class_code()

the read_members has this line:

self._members['fBeta_bin_params'] = self._stl_container0.read(chunk, cursor, context, file, self._file, self._concrete)
(click to expand) ```python def read_members(self, chunk, cursor, context, file): if self.is_memberwise: raise NotImplementedError( "memberwise serialization of {0}\nin file {1}".format(type(self).__name__, self.file.file_path) ) self._bases.append(c('TNamed', 1).read(chunk, cursor, context, file, self._file, self._parent, concrete=self._concrete)) self._bases.append(c('TAttLine', 2).read(chunk, cursor, context, file, self._file, self._parent, concrete=self._concrete)) self._bases.append(c('TAttFill', 2).read(chunk, cursor, context, file, self._file, self._parent, concrete=self._concrete)) self._bases.append(c('TAttMarker', 2).read(chunk, cursor, context, file, self._file, self._parent, concrete=self._concrete)) self._members['fBeta_alpha'], self._members['fBeta_beta'] = cursor.fields(chunk, self._format0, context) self._members['fBeta_bin_params'] = self._stl_container0.read(chunk, cursor, context, file, self._file, self._concrete) self._members['fConfLevel'] = cursor.field(chunk, self._format1, context) self._members['fFunctions'] = c('TList').read(chunk, cursor, context, file, self._file, self._concrete) self._members['fPassedHistogram'] = read_object_any(chunk, cursor, context, file, self._file, self) self._members['fStatisticOption'] = cursor.field(chunk, self._format2, context) self._members['fTotalHistogram'] = read_object_any(chunk, cursor, context, file, self._file, self) self._members['fWeight'] = cursor.field(chunk, self._format3, context) ```

which indicates that the memberwise is failing for the fBeta_bin_params. So then going back to the python code for making the TEfficiency, we do the following

eff.SetBetaBinParameters(0, -1.0, -2.0)
for i in range(1, nbins):
    eff.SetBetaBinParameters(i, 2**i, 2**(i+1))

eff.SetBetaBinParameters(nbins, -1.0, -2.0)

which bookends the alpha parameters by -1.0 and the beta parameters by -2.0 to make them easier to identify. Dumping out the chunk again and playing with the offset a bit (finding double-precision values for these parameters, using >f8), we find them:

(Pdb) cursor.debug(chunk, dtype=">f8", offset=2, limit_bytes=240)
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
  0   0   0 215 190 210   0   0   0  13 191 240   0   0   0   0   0   0  64   0
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---   @ ---
                1.3525824167906353e-304                            -1.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
          0   0   0   0   0   0  64  16   0   0   0   0   0   0  64  32   0   0   0   0
        --- --- --- --- --- ---   @ --- --- --- --- --- --- ---   @     --- --- --- ---
                            2.0                             4.0
                        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
                          0   0  64  48   0   0   0   0   0   0  64  64   0   0   0   0   0   0  64  80
                        --- ---   @   0 --- --- --- --- --- ---   @   @ --- --- --- --- --- ---   @   P
                            8.0                            16.0                            32.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
          0   0   0   0   0   0  64  96   0   0   0   0   0   0  64 112   0   0   0   0
        --- --- --- --- --- ---   @   ` --- --- --- --- --- ---   @   p --- --- --- ---
                           64.0                           128.0
                        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
                          0   0  64 128   0   0   0   0   0   0  64 144   0   0   0   0   0   0 191 240
                        --- ---   @ --- --- --- --- --- --- ---   @ --- --- --- --- --- --- --- --- ---
                          256.0                           512.0                          1024.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
          0   0   0   0   0   0  63 240   0   0   0   0   0   0 192   0   0   0   0   0
        --- --- --- --- --- ---   ? --- --- --- --- --- --- --- --- --- --- --- --- ---
                           -1.0                             1.0
                        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
                          0   0  64  16   0   0   0   0   0   0  64  32   0   0   0   0   0   0  64  48
                        --- ---   @ --- --- --- --- --- --- ---   @     --- --- --- --- --- ---   @   0
                           -2.0                             4.0                             8.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
          0   0   0   0   0   0  64  64   0   0   0   0   0   0  64  80   0   0   0   0
        --- --- --- --- --- ---   @   @ --- --- --- --- --- ---   @   P --- --- --- ---
                           16.0                            32.0
                        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
                          0   0  64  96   0   0   0   0   0   0  64 112   0   0   0   0   0   0  64 128
                        --- ---   @   ` --- --- --- --- --- ---   @   p --- --- --- --- --- ---   @ ---
                           64.0                           128.0                           256.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
          0   0   0   0   0   0  64 144   0   0   0   0   0   0  64 160   0   0   0   0
        --- --- --- --- --- ---   @ --- --- --- --- --- --- ---   @ --- --- --- --- ---
                          512.0                          1024.0
                        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
                          0   0 192   0   0   0   0   0   0   0  63 240   0   0   0   0   0   0  63 229
                        --- --- --- --- --- --- --- --- --- ---   ? --- --- --- --- --- --- ---   ? ---
                         2048.0                            -2.0                             1.0
        --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
        216 151 162  65 163 245  64   0   0  17   0   5   0   1   0   0   0   0   3   0
        --- --- ---   A --- ---   @ --- --- --- --- --- --- --- --- --- --- --- --- ---
                 0.682689492137              2.0000324250722774

So to summarize so far, we have (for the chunk + cursor):

kratsg commented 3 years ago

Ok, getting further, we have a slight issue that I think uproot might need to be refactored since I think memberwise streaming breaks the current Model.read assumptions. Let me step through what I see happen.

  1. Cursor(113, origin=-71) -> _something_else = cursor.field(chunk, struct.Struct(">H"), context)
  2. Cursor(117, origin=-71) -> length = cursor.field(chunk, _stl_container_size, context)
  3. Cursor(119, origin=-71) -> values = _read_nested(
  4. Cursor(123, origin=-71)

with this initial "preprocessing" sequence, we have:

So far, so good. length here refers to the length of the std::vector (which we always want as we allocate that many values when reading). The _num_memberwise_bytes is interesting, as it seems to be off-by-one perhaps (will remake this file but with more entries in the std::vector to see...)

Now, the problem is the following. At this point, we make a call to _read_nested:

        values = _read_nested(
            self._values, length, chunk, cursor, context, file, selffile, parent
        )

which goes into this loop

            for i in uproot._util.range(length):
                values[i] = model.read(chunk, cursor, context, file, selffile, parent)
                print(cursor, values[i])

where values is a 13-element array allocated correctly. However, the values coming out of this are nonsensical... But it was clearly fine! In fact, the model.read is shifting the cursor another 2 bytes to read a version. Here's a portion of model.read:

        self.hook_before_read(chunk=chunk, cursor=cursor, context=context, file=file)

        self.read_numbytes_version(chunk, cursor, context)

where self.read_numbytes_version will read in 2 bytes to grab the version. Problematically, as you can see, we have what might be a version number before the length instead of after. This makes things difficult. So my idea is one of two ways:

I found the hook to be too hard to figure out (seems that it's something that ROOT might have to hook into before or after creating) so I ended up implementing the second option using rollback_nbytes like so:

def _read_nested(
    model, length, chunk, cursor, context, file, selffile, parent, header=True, rollback_nbytes=0
):
    if isinstance(model, numpy.dtype):
        return cursor.array(chunk, length, model, context)

    else:
        values = numpy.empty(length, dtype=_stl_object_type)
        if isinstance(model, AsContainer):
            for i in uproot._util.range(length):
                cursor._index = cursor._index - rollback_nbytes
                values[i] = model.read(
                    chunk, cursor, context, file, selffile, parent, header=header
                )    
        else:
            for i in uproot._util.range(length):
                cursor._index = cursor._index - rollback_nbytes
                values[i] = model.read(chunk, cursor, context, file, selffile, parent)
                print(cursor, values[i])
        return values

which is CLEARLY hacky but I'm ok with this for right now. I'm able to read out the memberwise object without an error, but I need to now teach the Cursor to read in a memberwise fashion (jumping around the cursor for me instead).

kratsg commented 2 years ago

Adding this in here to make sure we don't lose useful information: https://root-forum.cern.ch/t/how-to-enable-tbufferfile-kstreamedmemberwise-for-specific-branches-in-a-ttree/43788/6 .

kreczko commented 2 years ago

I stumbled across this issue today and I am more confused than ever.

TL;DR version: I can read in TEfficiency if I open skhep_testdata.data_path("uproot-issue38c.root") before my file.

does not work

import uproot
import skhep_testdata

with uproot.open(skhep_testdata.data_path("uproot-issue209.root")) as fp:
    eff = fp["TEfficiencyName"]
    print(eff)

but this works

import uproot
import skhep_testdata

with uproot.open(skhep_testdata.data_path("uproot-issue38c.root")) as fp:
    hist = fp["TEfficiencyName"] # need to load the TEfficiency

with uproot.open(skhep_testdata.data_path("uproot-issue209.root")) as fp:
    eff = fp["TEfficiencyName"]
    print(eff)

What kind of black magic is included in "uproot-issue38c.root" and how can I add to my files?

jpivarski commented 2 years ago

There is a global state change when you open a file (i.e. the black magic). There's a global uproot.classes dict with Python Models for C++ class name-version pairs, such as TEfficiency version XYZ. When trying to read an instance of the class from a file, it first uses the Model in the global uproot.classes, which defines some deserialization procedure. If that deserialization procedure fails, it then tries reading the specific file's TStreamerInfo, which encodes deserialization procedures for each class name-version in the file (maybe—some TStreamerInfos are missing some classes). If the second try fails, you get an error message. New class name-version combinations are added to the uproot.classes dict when they're learned.

Although you'd think that a particular class name-version pair would always have the same deserialization procedure, maybe the file was made with a custom-compiled version of ROOT, which has new C++ members added to a class without a new version number, or maybe the file was hadd'ed with another that does, etc. That's why we have a try, try-again procedure, and even that might fail if it's weird enough and doesn't declare its weirdness.

To get more insight into what's going on in this case, you can look at

fp.file.show_streamers("TEfficiency")

(uproot.ReadOnlyFile.show_streamers) to see if there are different versions of some class (maybe one of the classes TEfficiency inherits from or contains) or if they have the same version but nevertheless different deserialization procedures (described as a sequence of member data and their types).

kreczko commented 2 years ago

Hi Jim,

Thank you for the explanation. I've been trying to figure out the differences between the two files since I wanted to rewrite my old files with newer ROOT to add the streamers (if there is a way).

fp.file.show_streamers("TEfficiency")

Unfortunately, the output of that line is identical for both skhep_testdata.data_path("uproot-issue209.root") and skhep_testdata.data_path("uproot-issue38c.root"). There must be another difference between these two files. According the the uproot test_0038-memberwise-serialization.py, uproot-issue209.root should not contain any streamers (at least it fails also without reset_classes), yet fp.file.show_streamers("TEfficiency") reports them.

output of fp.file.show_streamers("TEfficiency") for uproot-issue209.root
THashList (v0): TList (v5)

TAttAxis (v4)
    fNdivisions: int (TStreamerBasicType)
    fAxisColor: short (TStreamerBasicType)
    fLabelColor: short (TStreamerBasicType)
    fLabelFont: short (TStreamerBasicType)
    fLabelOffset: float (TStreamerBasicType)
    fLabelSize: float (TStreamerBasicType)
    fTickLength: float (TStreamerBasicType)
    fTitleOffset: float (TStreamerBasicType)
    fTitleSize: float (TStreamerBasicType)
    fTitleColor: short (TStreamerBasicType)
    fTitleFont: short (TStreamerBasicType)

TAxis (v10): TNamed (v1), TAttAxis (v4)
    fNbins: int (TStreamerBasicType)
    fXmin: double (TStreamerBasicType)
    fXmax: double (TStreamerBasicType)
    fXbins: TArrayD (TStreamerObjectAny)
    fFirst: int (TStreamerBasicType)
    fLast: int (TStreamerBasicType)
    fBits2: unsigned short (TStreamerBasicType)
    fTimeDisplay: bool (TStreamerBasicType)
    fTimeFormat: TString (TStreamerString)
    fLabels: THashList* (TStreamerObjectPointer)
    fModLabs: TList* (TStreamerObjectPointer)

TH1 (v8): TNamed (v1), TAttLine (v2), TAttFill (v2), TAttMarker (v2)
    fNcells: int (TStreamerBasicType)
    fXaxis: TAxis (TStreamerObject)
    fYaxis: TAxis (TStreamerObject)
    fZaxis: TAxis (TStreamerObject)
    fBarOffset: short (TStreamerBasicType)
    fBarWidth: short (TStreamerBasicType)
    fEntries: double (TStreamerBasicType)
    fTsumw: double (TStreamerBasicType)
    fTsumw2: double (TStreamerBasicType)
    fTsumwx: double (TStreamerBasicType)
    fTsumwx2: double (TStreamerBasicType)
    fMaximum: double (TStreamerBasicType)
    fMinimum: double (TStreamerBasicType)
    fNormFactor: double (TStreamerBasicType)
    fContour: TArrayD (TStreamerObjectAny)
    fSumw2: TArrayD (TStreamerObjectAny)
    fOption: TString (TStreamerString)
    fFunctions: TList* (TStreamerObjectPointer)
    fBufferSize: int (TStreamerBasicType)
    fBuffer: double* (TStreamerBasicPointer)
    fBinStatErrOpt: TH1::EBinErrorOpt (TStreamerBasicType)
    fStatOverflows: TH1::EStatOverflows (TStreamerBasicType)

TCollection (v3): TObject (v1)
    fName: TString (TStreamerString)
    fSize: int (TStreamerBasicType)

TSeqCollection (v0): TCollection (v3)

TList (v5): TSeqCollection (v0)

TAttMarker (v2)
    fMarkerColor: short (TStreamerBasicType)
    fMarkerStyle: short (TStreamerBasicType)
    fMarkerSize: float (TStreamerBasicType)

TAttFill (v2)
    fFillColor: short (TStreamerBasicType)
    fFillStyle: short (TStreamerBasicType)

TAttLine (v2)
    fLineColor: short (TStreamerBasicType)
    fLineStyle: short (TStreamerBasicType)
    fLineWidth: short (TStreamerBasicType)

TString (v2)

TObject (v1)
    fUniqueID: unsigned int (TStreamerBasicType)
    fBits: unsigned int (TStreamerBasicType)

TNamed (v1): TObject (v1)
    fName: TString (TStreamerString)
    fTitle: TString (TStreamerString)

TEfficiency (v2): TNamed (v1), TAttLine (v2), TAttFill (v2), TAttMarker (v2)
    fBeta_alpha: double (TStreamerBasicType)
    fBeta_beta: double (TStreamerBasicType)
    fBeta_bin_params: vector > (TStreamerSTL)
    fConfLevel: double (TStreamerBasicType)
    fFunctions: TList* (TStreamerObjectPointer)
    fPassedHistogram: TH1* (TStreamerObjectPointer)
    fStatisticOption: TEfficiency::EStatOption (TStreamerBasicType)
    fTotalHistogram: TH1* (TStreamerObjectPointer)
    fWeight: double (TStreamerBasicType)

different deserialization procedures (described as a sequence of member data and their types).

But then I would expect the deserialization to return garbage of sorts (e.g. interpreting data for the wrong slots). However, reading my old hists with ROOT and comparing them to uproot (with the workaround of loading uproot-issue38c.root first), the only difference I see are the under- and overflow bins, which is just the difference between np.array(root_hist) vs uproot_hist.to_numpy()

And my own files show a slight difference (older version of TH1):

29c29
< TH1 (v8): TNamed (v1), TAttLine (v2), TAttFill (v2), TAttMarker (v2)
---
> TH1 (v7): TNamed (v1), TAttLine (v2), TAttFill (v2), TAttMarker (v2)
51d50
<     fStatOverflows: TH1::EStatOverflows (TStreamerBasicType)
jpivarski commented 2 years ago

Uproot has built-in Models for TH1 (v8), but not for TH1 (v7).

https://github.com/scikit-hep/uproot5/blob/70db4e1a6eaf6697e0f17d25e4dc619cf098670e/src/uproot/models/TH.py#L1021-L1026

The purpose of this is to avoid having to read TStreamerInfo for the most common/most up-to-date files, but fall back on reading TStreamerInfo if necessary. Reading the file with the TH1 (v7) in it will change the global state of the uproot.classes dict, but reading the file with the TH1 (v8) in it will only change it if it finds that the presumed data layout (the built-in Model) is wrong.

jpivarski commented 2 years ago

I wanted to rewrite my old files with newer ROOT to add the streamers (if there is a way).

Since both files produce output with fp.file.show_streamers(), they both have streamers.

I just checked Uproot's built-in streamer for TH1 (v8), and it's the same as the TH1 (v8) in your file:

TH1 (v8): TNamed (v1), TAttLine (v2), TAttFill (v2), TAttMarker (v2)
    fNcells: int (TStreamerBasicType)
    fXaxis: TAxis (TStreamerObject)
    fYaxis: TAxis (TStreamerObject)
    fZaxis: TAxis (TStreamerObject)
    fBarOffset: short (TStreamerBasicType)
    fBarWidth: short (TStreamerBasicType)
    fEntries: double (TStreamerBasicType)
    fTsumw: double (TStreamerBasicType)
    fTsumw2: double (TStreamerBasicType)
    fTsumwx: double (TStreamerBasicType)
    fTsumwx2: double (TStreamerBasicType)
    fMaximum: double (TStreamerBasicType)
    fMinimum: double (TStreamerBasicType)
    fNormFactor: double (TStreamerBasicType)
    fContour: TArrayD (TStreamerObjectAny)
    fSumw2: TArrayD (TStreamerObjectAny)
    fOption: TString (TStreamerString)
    fFunctions: TList* (TStreamerObjectPointer)
    fBufferSize: int (TStreamerBasicType)
    fBuffer: double* (TStreamerBasicPointer)
    fBinStatErrOpt: TH1::EBinErrorOpt (TStreamerBasicType)
    fStatOverflows: TH1::EStatOverflows (TStreamerBasicType)

So whether it tries to read your file with the TH1 (v8) in it first or last, it does not change global state because the Model in uproot.classes before reading the file agrees with the TStreamerInfo in the file.

But reading the file with the TH1 (v7) in it does change uproot.classes, since it doesn't know about v7, so it has to look in the file's TStreamerInfo and update uproot.classes.

nickwp commented 1 year ago

Finding this issue from here. Is it still the case that uproot is unable to read a branch of TH1Ds? Are there any workarounds that do not involve reading the files with ROOT to extract the TH1D information (which defeats the point of using uproot in the first place)?

jpivarski commented 1 year ago

I don't think it's TH1D specifically; one of our unit tests reads a TTree TBranch of TH1F.

I don't know what determines whether ROOT writes the objects with memberwise splitting or not, but we can support one and not the other. (Memberwise splitting is very different and will require another deep dive into reverse-engineering the binary format.)

nickwp commented 1 year ago

Okay, so I guess for some unknown reason the branch of TH1D is being written with memberwise splitting in the files I'm using. For now will have to work with ROOT directly then, at least as far as converting / rewriting into a format uproot can read.

jpivarski commented 8 months ago

Issue #1190 has an example of TH1Fs whose TAxis members are memberwise split and TH2Fs whose TAxis members are not memberwise split.

That will be useful to anyone who is willing to try to implement memberwise splitting.