ROOT binary format documentation?

tamasgal commented 4 years ago

Are there any plans to consolidate all the immensely valuable information Jim and all the other people have collected about the ROOT binary dataformat and put it into a format description document? I am very well aware of the fact that this is really something the ROOT developers should provide, but as far as I know, there is no publicly available information other than the source code of ROOT. Please(!) correct me if I am wrong!

I am asking because I am thinking about a Julia "uproot" project as well and such a document would of course be tremendously helpful since uproot of course fully utilises the Python language and numpy features, so the code is very tailored and thus it's not so straight forward to get a big picture.

I am also willing to help to maintain such a document.

sbinet commented 4 years ago

uproot started as a "conversion" of groot (at the time called rootio) from Go to Python. groot has a bit of documentation about the ROOT file format:

https://godoc.org/go-hep.org/x/hep/groot#hdr-File_layout

extracted from ROOT's "extensive" one. dir tree

that's, of course, nowhere close from a very detailed and robust specification, and, unfortunately, the ROOT devs explicitly said they didn't want to commit to that kind of thing.

tamasgal commented 4 years ago

Thanks Sebastien, I already knew about that graph but I totally missed that there is a go-hep/groot thing, that's awesome!

jpivarski commented 4 years ago

At the moment, the ROOT format can be triangulated from its implementations: the official ROOT (C++), groot (Go), uproot (Python), and Laurelin (Java). I highly recommend go-hep/rootio (which I've spent a lot of time reading) and its successor, groot (which I haven't) for a learning curve—the code is very clear and compilation takes only a few minutes, so it's also tinkerable.

But on the more general problem of documentation, I've been kicking around this idea, and I brought it up in a few private conversations at CHEP, to write a jupyter-book on the format. It would be a book-length collection of Jupyter notebooks that pedagogically describes how the format works. I was actually thinking of writing it in Julia as a way to learn the language, but I've been persuaded that Python is a better choice for accessibility. Such a document could help future implementations, but the primary goal would be to encode this information for safe-keeping. One of the key assets of this book would be the test samples that people have sent me to fix various errors—the exposition would focus on details that matter for common use-cases. It could be argued that a static document would only capture the state of ROOT I/O at a particular time, but in my experience, ROOT I/O changes very slowly, and backward-incompatible changes are extremely rare.

However, don't hold your breath: I haven't fully committed to this project and the earliest I could get to it is summer 2020. The financial justification would be to consider it part of "uproot maintenance," making it easier to transfer development of uproot to another programmer.

sbinet commented 4 years ago

for completeness, there's also a Rust implementation (by @cbourjau):

jpivarski commented 4 years ago

Oh, good—I didn't know that was still in development. If there are others, certainly bring them up!

cbourjau commented 4 years ago

Thanks for including root-io in this discussion. Last I checked, the Rust implementation was the only parser using a parser-combinator library, which makes it easier to reason about the binary format IMHO. Not at all a documentation of the binary format but it might help to look at the root-io source if one is stuck. Also note that Rust has a good, practically zero-cost interoperability with C and can even generate C-bindings automatically. I don't know about Julia in particular, but I would be surprised if it were to not have a good FFI to C since most languages do. I other words, root-io (Rust) + C-bindings might be a good starting point for somebody wanting to access ROOT files in an unsupported language.

As a side note: The Rust implementation is alive and I am currently refactoring it to be completely async and wasm compatible. The idea is to use it as a proof of concept for distributed scientific computing in wasm.

jpivarski commented 4 years ago

@cbourjau Good to hear from you and sorry that I left you out of the previous list!

At first, I was hoping to use a parser-combinator library, but I got stuck at the state-changing effects of _readobjany (TBufferFile::ReadObjectAny in ROOT) and the dynamic rule-creation of creating classes with streamers.

Do you have solutions for these problems? (You can skip streamer dynamism by writing version-dependent branches in the code, or by having a library of combinators for each streamer—in a sense, a streamer is a combinator, but provided in the ROOT file itself. However, I don't see a way around the state-changing effects of _readobjany. The mutable state is not even a stack: a Turing-complete language is required, and the combinators that I looked at weren't powerful enough, by design.)

cbourjau commented 4 years ago

@jpivarski I am not sure I completely get your question and it has been quite a while since I wrote that code. Maybe this is what you are looking for: https://github.com/cbourjau/alice-rs/blob/38a324cca527c02e140215e34a02daa994ff2a91/root-io/src/core/parsers.rs#L202:L266

Especially the second function class_name_and_buffer. It figures out the type which ought to be read and also returns the associated bytes of that data. Since the data might be somewhere else in the file, this function also takes a context argument which allows it to seek more data elsewhere. To be fair, root-io is primarily geared to read data from TTrees. So it might not be applicable to uproot.

jpivarski commented 4 years ago

@cbourjau I see: you didn't try to express the whole ROOT format in a combinator, but used combinators throughout a conventional codebase. That makes sense: it's like parsing a complex text corpus with regexes sprinkled throughout conventional code instead of trying to recognize the whole corpus with a single regex (which is impossible for certain types of languages, and this part of the ROOT format makes it one of those languages).

Is this a complete and ready-to-use product? It has ALICE in the name, but isn't connected with the ALICE software stack. I personally haven't heard anything about this project since it first started—if it's ready-to-use, we should advertise it through HSF and IRIS-HEP or something.

cbourjau commented 4 years ago

@jpivarski Ah, now I get your question! Yes, expressing the entire file format in one big parsers would be nice from a grammar-definition point of view but very difficult. Also, I think it might be of limited use in an implementation. What would it mean to "parse" the entire file? Instead, the program flow is more amendable to smaller combinators such as "parse the header of this TKey".

I'm sorry to hear that you did not know about this project! I guess my advertisement is not great :D. The primary goal of the alice-rs project is to make the analysis of the ALICE open data possible for people from outside the ALICE collaboration. However, the individual "crates" (as they are called in Rust-land) such as "root-io" are not tied to anything ALICE related and can be used individually. The project has been used in at least these two publications:

https://arxiv.org/abs/1907.00413 (not sure about the current status of publication)
https://arxiv.org/abs/1812.07449 (Phys. Rev. C)

Which I think also happen to be the only publications on ALICE's open data which used the full data set, instead of the simplified Master Class data. So, yes, I'd say its ready. (And in my biased view I'd trust it more than the official software stack ;) ) If alice-rs or any of its components were to find a mention at HSF, IRIS-HEP, or at related places I'd be very happy.

sbinet commented 4 years ago

@cbourjau

As a side note: The Rust implementation is alive and I am currently refactoring it to be completely async and wasm compatible

this is wandering a bit off topic, but:

$> GOOS=js GOARCH=wasm go build -o foo.wasm main.go

package main
import _ "go-hep.org/x/hep/groot"

func main() {}

(maybe this deserves a larger blog post on go-hep.org...)

tamasgal commented 4 years ago

I just wanted to let you know that I started to work on the Julia package. It is really a massive undertaking but I have some tint tiny progress and am already able to read the streamer data, trees and branches. I still did not implemented the dynamic parser code generation yet since I need to understand how everything is structured, so at the moment many of the streamers are simply hardcoded (and quite a few are not included at all yet 😉).

The current challenge for me is to understand how to actually parse the data of a branch, but thanks to uproot, the debugging process is quite nice, although I have troubles debugging dynamic structures (created by the streamer info).

Anyways, just wanted to give some feedback and at the same time I try to get some silent motivation 🙈

Here is the project: https://github.com/tamasgal/ROOTIO.jl

It is not even alpha, very cluttered and there are many inconsistencies in the implementation since I am still trying to find the best way to organise the code, but it's already something.

Many thanks for the resources and also for uproot which is a huge help!

oschulz commented 4 years ago

Congratulations on a good start - this is certainly a very courageous undertaking. If you let me know when becomes able to read simple root files (e.g. TTree with scalar and vector branches), I probably will be able to find some alpha testers for you.

tamasgal commented 4 years ago

Thanks Oliver, I am almost there 😉 one or two weeks I guess...

cbourjau commented 4 years ago

Congratulations and welcome to the club! Since you mention streamers. I spend quite a bit of time on those for the Rust parser thinking that somehow they are crucial. I guess they will be very handy for data archeologists in the year 2100 trying to reinterpret the LHC data with nothing else to go on. They can also be handy to boostrap your project. For example, I used the streamer to create yaml files of the layouts and some Rust templates. However, I eventually had the epiphany that any "run time parser" from these streamers is of rather limited usefulness. Even if it works, you still only end up with a rather convoluted struct without any member functions! If you want to interact with the data in any reasonably convenient way, you need those functions, but they are not stored in the streamers. YMMV, but I'd say you'll likely find yourself hard-coding the layout of all common types (TH1, TTree, ...) anyways. At that point you can just hard-code the parser and probably ignore the streamer section at runtime.

Out of interest: Are you using some kind of parser library or are you hand-rolling? I found a parser-combinator approach to be a good fit.

jpivarski commented 4 years ago

(Hi @cbourjau! Our messages crossed, so I'm partly responding to your comments in here.)

@tamasgal: congratulations and good luck! A few years ago, we tried to access ROOT in Julia using Julia's Cxx.jl library (I could try to find references, if that's helpful). In the end, Julia and ROOT required different versions of LLVM and the shared-object symbols overshadowed each other (they weren't enclosed in a namespace), so it wasn't doable. However, things might have changed since then: it might be possible now.

On the original subject of this thread, documentation, I may be getting started on that months later than I originally thought. On the other hand, I plan to clean up the Uproot code while adapting it for Awkward 1, so perhaps it would become more useful as a substitute for documentation.

Even if it works, you still only end up with a rather convoluted struct without any member functions! If you want to interact with the data in any reasonably convenient way, you need those functions, but they are not stored in the streamers. YMMV, but I'd say you'll likely find yourself hard-coding the layout of all common types (TH1, TTree, ...) anyways. At that point you can just hard-code the parser and probably ignore the streamer section at runtime.

For what it's worth, Uproot (version 2+) does go through the process of parsing the streamers and creating convoluted structs at runtime. Uproot 1 hand-coded the layout of common types, but I think parsing the streamers has been worthwhile. Some of the files presented in Issues show that people have been using it on ROOT files from Geant, which has an independent implementation of file-writing (another member of the club), which produces ancient TTree versions that wouldn't have deserialized if it had been hard-coded.

I'm helped in this by the fact that Python is dynamically typed: generating new classes at runtime is not the problem that it would be in Rust (maybe it's not even possible in Rust?) and I use mixins to define behaviors, to make up for the missing methods. (The convoluted structs have fields with members named _fThis and _fThat, which deserialize differently from different versions of the classes, and the behavioral methods defined in mixins don't care where those fields came from; they just expect to find them in the struct. Defining all of those methods is an open-ended problem, so they're in a separate project: uproot-methods.)

I don't see a way to do this in Rust without embedding a compiler, as ROOT does for C++. Fortunately for you, Julia also embeds a compiler. :) Since Julia doesn't have class methods, there's even less to do in the mixin department: all you need are free functions that expect structs with particular fields in them. Whereas I had to use multiple inheritance to do it in Python, you get it for free in Julia.

So, this is a language-dependent thing, but I think you (@tamasgal) might to want to parse the streamers for Julia, and I fully understand why you (@cbourjau) didn't in Rust.

tamasgal commented 4 years ago

Thanks for the information Christian! In our files we have a lot of custom classes (Jim could probably write a few songs about it) and they even have different class versions, so it's quite complicated to keep track of everything. My hope is still that I can somehow get this working as a "run time parser", but I see the point that hard-coding a lot of stuff is way more convenient. The problem of course remains with the changing class versions, but as far as I understood, the basic datatypes do not really change.

Regarding the structs and their functions: my biggest hope is Julia's beautiful multiple dispatch system. I think it's actually a nice pattern to not convolute the data type with its behaviour (I like the functional, type-based approach better) but define them separately.

For example, at this moment I am experimenting with implementing the class version right into the type definitions and use that in the dispatch itself. It's a bit hard to explain, but it goes something like this:

julia> struct TFoo{V} end

julia> whatever(::TFoo{1}) = "calling with class version 1"
whatever (generic function with 1 method)

julia> whatever(::TFoo{2}) = "calling with class version 2"
whatever (generic function with 2 methods)

julia> whatever(::TFoo{V}) where V = "calling with unspecialised class version $V"
whatever (generic function with 3 methods)

julia> whatever(TFoo{1}())
"calling with class version 1"

julia> whatever(TFoo{2}())
"calling with class version 2"

julia> whatever(TFoo{42}())
"calling with unspecialised class version 42"

...there are a lot of possibilities, but before I need to get much more understanding about the ROOT format. Currently I am only scratching the surface I feel (what is this speedbump, when do I know that I need to do the startcheck/endcheck thing (to count the bytes for consistency), how to control the cursor effectively, what to do with parents etc.).

Out of interest: Are you using some kind of parser library or are you hand-rolling? I found a parser-combinator approach to be a good fit.

At this moment I am hand-rolling it but it's basically a parser-combinator approach. I am defining parsefields!() method for the specific components and then create the structs on the fly, including an unpack() method.

tamasgal commented 4 years ago

@tamasgal: congratulations and good luck!

Thanks ;) I got like 1% working, so looking forward to the hard 99% 🙈

On the original subject of this thread, documentation, I may be getting started on that months later than I originally thought. On the other hand, I plan to clean up the Uproot code while adapting it for Awkward 1, so perhaps it would become more useful as a substitute for documentation.

I am really looking forward to it!

A few years ago, we tried to access ROOT in Julia using Julia's Cxx.jl library (I could try to find references, if that's helpful). In the end, Julia and ROOT required different versions of LLVM and the shared-object symbols overshadowed each other (they weren't enclosed in a namespace), so it wasn't doable. However, things might have changed since then: it might be possible now.

Ah well, I really do not want this library to depend on ROOT and also messing with different LLVM versions (even with a single one) is already a pain, so I guess I'll ditch this 😉 We had similar problems with interfacing Python libraries to Julia which used Numba, due to LLVM incompatibilities.

I'm helped in this by the fact that Python is dynamically typed: generating new classes at runtime is not the problem that it would be in Rust (maybe it's not even possible in Rust?) and I use mixins to define behaviors, to make up for the missing methods.

Indeed, this is really a game-changer in the implementation w.r.t. Rust. Just a remark: I did a lot of performance tests with our thin wrapper to uproot (https://github.com/KM3NeT/km3io) and it clearly shows that the parser part of the library does not have to be highly efficient and easily outperforms the PyROOT implementation and sometimes even the C++ part (e.g. in combination with Numba). All in all, the overall performance of uproot seems on the same level as the C++ implementation when it comes to heavy I/O, just as the benchmarks shows in the README.

Fortunately for you, Julia also embeds a compiler. :) Since Julia doesn't have class methods, there's even less to do in the mixin department: all you need are free functions that expect structs with particular fields in them. Whereas I had to use multiple inheritance to do it in Python, you get it for free in Julia.

Exactly, that's one of my biggest hopes :)

jpivarski commented 4 years ago

what is this speedbump

It's a word I made up: you won't find it in ROOT documentation. When a C++ class contains a pointer to data, such as an array, the serialized version is prepended by one byte to distinguish nullptr from empty arrays. I think I've seen instances in which this byte is overloaded with additional meanings, but I don't think I've encoded any of those in Uproot. In fact, I also didn't care about missing vs empty, so Uproot just skips over it. This byte only appears inside of split classes, which looks odd when you have an array-centric view of the branches: branches from a split class don't look different in tree.show() but they have this serialization difference. Also, it complicates the use of NumPy to read it out (because we have to skip 1 in random parts of an array of 4 or 8 byte items), so I grumblingly named it "speedbump."

when do I know that I need to do the startcheck/endcheck thing (to count the bytes for consistency)

In nearly every nested structure, where inheritance counts as nesting at the beginning of the structure. Any exceptions to that rule do have to be hard-coded (not derived from streamers), but those are mostly ROOT core classes that don't change much over the years. Discovering that you need to/don't need to comes from reading raw bytes from files whose content you do understand, aided by other implementations of ROOT I/O that you can modify. For me, it was go-hep/rootio because it could be compiled in a few minutes, whereas ROOT itself took over an hour to compile. My favorite debugging technique—print statements—are only practical if you have a fast turn-around time.

how to control the cursor effectively

The use of a "Cursor" object is not coupled to ROOT I/O. It was my attempt to separate the problem of finding your way around a file (Cursor) from the problem of serving up bytes from a local or remote file (Source). My primary view of the ROOT file is a memory map, which makes it easier to jump around as you have to without copying parts into memory buffers purely for the sake of bookkeeping. (That backfired in _readobjany, in which seek positions are saved in the file relative to one of these copied memory buffers, so I had to undo my flattening with an offset.)

Although it has nothing to do with ROOT I/O, I recommend the Cursor approach. It made it much easier to implement parallel reads, since the Source is now a stateless object; per-thread stateful Cursors can all point at the same static Source.

what to do with parents

Not sure what you mean, unless you're talking about jagged array handling. Since you can write fast for loops in Julia, you probably won't have to follow the vectorized approach that jagged array parents represents.

At this moment I am hand-rolling it but it's basically a parser-combinator approach. I am defining parsefields!() method for the specific components and then create the structs on the fly, including an unpack() method.

That all is fine until you reach _readobjany, which makes the ROOT format dynamic. Parts of the implementation will have to be imperative code.

We used to have a "polyglot ROOT I/O" Google Group, but I can't find it anymore. Maybe some sort of channel could make up for the fact that it will be some time before I can write the documentation I promised at the beginning of this thread.

oschulz commented 4 years ago

@jpivarski: A few years ago, we tried to access ROOT in Julia using Julia's Cxx.jl [...] Julia and ROOT required different versions of LLVM and the shared-object symbols overshadowed each other

We actually used that quite a bit, during Julia v0.6 times (ROOT.jl). With a modified Julia binary that loaded ROOT first and then Julia it worked quite well. However, I never found time to port it to Julia v1.x, and I've been a happy user of uproot (wrapped in UpROOT.jl) for quite a while now.

cbourjau commented 4 years ago

I guess the point is to have fun and experiment :D Interesting that uproot now is 100% on the dynamic route! The dynamic approach is also possible in Rust/. You'd essentially end up with a sum type (aka enum in Rust) DynamicFoo which may either be a pointer to a nested DynamicFoo or a primary type such as a MyFloatVariant or a MyByteArrayVariant. It wouldn't be pretty, but certainly much less work than embedding a compiler :D The problem is rather that you would loose a huge amount of compile time guarantees going down that route and you'd be allocating way more than necessary. Furthermore, when writing Rust I feel a strange obsession with correctness. Hoping that hard-coded methods Just Work Correctly with unseen layout version would spoil the fun :). Concerning versioning. I did notice some versioning hard-coding in other implementations as well and took those over. It sounds like you have seen more dramatic changes than some added fields to TTree like this: https://github.com/cbourjau/alice-rs/blob/master/root-io/src/tree_reader/tree.rs#L148 ?

Ah, and now my memories about readObjectAny and its forsaken context dependency come back as well :D

tamasgal commented 4 years ago

Thanks Jim, I appreciate every single piece of information about this format 😄

Regarding the Cursor, I first thought that I follow the same approach but then I decided to keep it simple first to understand why it's better. I am starting to grasp the reasons, gradually ;)

what to do with parents Not sure what you mean, unless you're talking about jagged array handling. Since you can write fast for loops in Julia, you probably won't have to follow the vectorized approach that jagged array parents represents.

Yes indeed...

That all is fine until you reach _readobjany, which makes the ROOT format dynamic. Parts of the implementation will have to be imperative code.

Yes I started to scratch that too. The readobjany is already implemented and used in a few structs. It's a long way to go still...

We used to have a "polyglot ROOT I/O" Google Group, but I can't find it anymore. Maybe some sort of channel could make up for the fact that it will be some time before I can write the documentation I promised at the beginning of this thread.

I think that would be really nice. I certainly could help out with some basic stuff and with more advanced stuff once I get a better feeling for the ROOT format!

tamasgal commented 4 years ago

I am currently really struggling to understand how the branch data is laid out. I feel a bit uncomfortable to ask here (since many of you figured it out by yourselves) but this is my best bet to get a hint on how to proceed. I appreciate any kind of help...

The Tree Data Structures (https://github.com/scikit-hep/uproot/issues/401#issuecomment-552788333) is a bit confusing to me.

I managed to read the streamers, the main tree (no nested trees yet but that's ok), its branches etc. but for example for the file (https://github.com/tamasgal/ROOTIO.jl/blob/master/test/samples/tree_with_histos.root), I look at a specific branch (f["t1"]["mynum"]) and don't understand where to find the actual data (fLeaves shows me that it has one entry TLeafI with a range from fMinimum=0 to fMaximum=10):

julia> f["t1"]["mynum"].fLeaves.elements[1]
ROOTIO.TLeafI
  fName: String "mynum"
  fTitle: String "mynum"
  fLen: Int32 1
  fLenType: Int32 4
  fOffset: Int32 0
  fIsRange: Bool true
  fIsUnsigned: Bool false
  fLeafCount: UInt32 0x00000000
  fMinimum: Int32 0
  fMaximum: Int32 10

but the fBaskets is empty (I checked with uproot and also there it's Undefined):

julia> f["t1"]["mynum"].fBaskets
ROOTIO.TObjArray("", 0, Missing[missing, missing])

The variables which describe the baskets are the same as in uproot, so the parsing seems to work:

  fBasketBytes: Array{Int32}((10,)) Int32[116, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketEntry: Array{Int64}((10,)) [0, 25, 0, 0, 0, 0, 0, 0, 0, 0]
  fBasketSeek: Array{Int64}((10,)) [238, 0, 0, 0, 0, 0, 0, 0, 0, 0]

From here, I'd assume that there is only one basket (only the first one has nonzero bytes in fBasketBytes) and the position of that basket is the first item in fBasketSeek. I however do not understand at which offset. From which offset do I count 238 bytes? I tried to debug this through uproot but I cannot step into generated classes with the debugger. I have not looked at groot yet (no experience with Go yet) but I guess I have to start to triangulate with multiple libraries. I also tried to guess some starting points (TTree-TKey start + fObjLen etc.) but have not found the data yet.

Here is the data parsed with uproot:

In [782]: f['t1']['mynum'].array()
Out[782]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 10, 10, 10, 10, 10, 10,
       10, 10, 10, 10, 10, 10, 10, 10], dtype=int32)

Anyways, as said, I am happy about any input. I am kind of left alone with this "side-project" which I do to procrastinate my PhD write-up 🙈

jpivarski commented 4 years ago

Quick hint: the fBaskets inside of a TTree aren't the baskets you're looking for (unless you're recovering a prematurely closed TFile). The TTree C++ object contains a "current working basket" that it fills with incoming data, then when it's full, the basket is copied and compressed elsewhere in the file, outside of the TTree. Those external baskets are where the majority of the data can be found, and the entirety of the data if the TFile was closed properly (for my definition of "proper").

You can find the real baskets through the fBasketSeek array, which contains global file seek-points for each basket. That array is larger than the real data; there's another member variable that indicates how many real baskets you have. The free-floating basket is very similar to objects that you can find in TDirectories, like TH1F and TTree, except that the baskets aren't listed for users to see. View the raw bytes starting at one of the seek points specified by fBasketSeek and you should find a TKey followed by basket header, then the physics data themselves.

We really need to create a forum for these questions—that polyglot ROOT I/O Google Group would be very helpful now. Can somebody do that? There are a lot of ROOT I/O implementers watching this thread.

sbinet commented 4 years ago

there?

https://groups.google.com/forum/#!forum/polyglot-root-io

FYI, I have used these diagrams to decipher how the layout of ROOT was done. (they leave a lot to be desired, though) https://github.com/go-hep/hep/issues/361

the new rntuple stuff seems to be much more straightforward to read/write (and faster) can't wait for this to be the de facto "ROOT format".

tamasgal commented 4 years ago

Many thanks Jim for the explanations and your time, I’ll sit down this night and hopefully read my first branch data!

Also thanks Sebastien for the mailing list link, I skimmed through it and it’s a treasure! I’ll post there in future...

cbourjau commented 4 years ago

Don't worry asking! I asked plenty of questions as well at the time ~~which seem to have been lost to bit-rot~~ (Edit: See link above). Either way, here are my 5 cents from the Rust implementation. I second what @jpivarski also mention: You could have some TBaskets "inside" the tree, and some "free-floating" ones elsewhere. I have also never actually seen any (ALICE) root files having data "in-tree". Either way, here is the relevant part of the Rust implementation, IIUC: https://github.com/cbourjau/alice-rs/blob/master/root-io/src/tree_reader/branch.rs#L204. In a not so helpful rage-fit of "this-ROOT-naming-is-terrible!!!1!" I decided to call the TBasket "Container" at the time - a sum type with two variants: OnDisk and InMemory (again, sorry for the naming!). The former is what you want. You can find the layout of the TBasket/Container here: https://github.com/cbourjau/alice-rs/blob/master/root-io/src/tree_reader/container.rs#L42. Hope that helps. If not: Ask away!

jpivarski commented 4 years ago

Yes, let's move this to the existing Google Group; I should still get emails from there.

In Uproot, retrieving baskets from "inside" the TTree is called recover. When I first saw it, I thought they were broken ROOT files, but this is an allowed state, and it's allowed because it lets data be recovered when the process writing the TFile dies (e.g. DAQ).

There are some examples of such files in uproot's tests/samples, contributed by users. They're the ones accessed in the test_issues.py with method names mentioning "recover".

scikit-hep / uproot3

ROOT binary format documentation? #401