Closed rcross2 closed 5 years ago
Short answer: STL maps and sets have indeed not been implemented. However, they may be doable, particularly since the content types are so simple (ints and longs). Could you post the file?
I will see if I can generate a minimal file using our classes that recreate the error. Thank for the fast response!
@jpivarski https://send.firefox.com/download/4a96c974b069e1df/#-m-Z7nP80Q3Ca4su4vbK3A
This link is good for 1 download, let me know if you have trouble grabbing it.
It took me a moment to realize that you're trying to find this data in a non-TTree object. If PyROOT will work for you, you'll probably want to use that.
Nevertheless, I looked into it. The thing that is giving me the most trouble isn't in your printed error output—it's the Channel
(I3Eval_t::ChannelContainer_t*
that appears before te map<long,int>
in the new version of your software, in the file you sent me but not the one you've been working with. It's a case where a new class type is introduced but no data.
After that, I believe that the serialization of the map<long,int>
is 4 bytes ???, 8 bytes key, 4 bytes value. The keys and values are sorted, and there are a lot of them, like 3870 or so? It's the whole geometry of your detector. (IceCube? Are you looking for supernovae?) To figure out this new type, I'd need some guidance about what to expect—asking questions, like what numbers seem reasonable, etc. That process is doable.
But before getting into that, and seeing that the new Channel
is itself an issue, I'd like to ask again, do you really need uproot to read this? Isn't this readable with PyROOT? And even if there are thousands of geometry elements, that's not a very large number—the performance of PyROOT probably isn't an issue.
If you have control over how this file gets written (to avoid the Channel
, or at least to put it later in the class, for instance), then you might as well put the data in a TTree, where it would be easy to read. So I'm stopping now to be sure that you really need it.
Thanks!
Yeah this is a touch more confusing than I thought. This is some really legacy data and through the years there have been some mistakes when editing the streamer classes.
So I can access the data with PyROOT, but I have everything else written in uproot, just this one little structure I can't get. A real barrier for people to analyze our data is the installation of ROOT and the custom class libraries. We are trying to move the data to hdf5 at some point, but we're not there yet. I discovered your library and everyone that I've talked to about it are ecstatic that someone has re-implemented ROOT I/O in python. I just discovered it yesterday and we are all getting a lot of use out of it.
As for the issues with this file (I've re-written this a couple times as I am trying to work out what on earth they were doing when they designed this structure):
NumberOfChannels
just an intChannelIdMap
ChannelID
just a long int array (I think someone was trying to be clever making a map)Deadtime
just a double array (of size MaxChannels... which is always 5160)Efficiency
just a double array ^^^I3Eval_t::ChannelContainer_t*
This doesn't need to be read at all -- it's not meant to be in the ROOT file (someone moved the //!
in the .h
file not knowing that it actually changes the generated streamers) (why does root parse esoteric tags in comments :100:)ChannelIDMap
is defined as std::map<int64_t, int> ChannelIDMap;
So in reality I think I can just grab this data if only I could load the config/detector
class and ignore the pieces I don't need... unless the pieces I don't need affect where and how the other data are packed.
The keys and values are sorted, and there are a lot of them, like
38705160 or so? It's the whole geometry of your detector. (IceCube? [yep] Are you looking for supernovae? [yep])
Thanks for the information. The I3Eval_t::ChannelContainer_t*
is particularly confusing, from the raw bytes end of things. I saw the type name in the data, but it was a zero-terminated string, rather than a size-followed-by-data string. Zero-terminated strings happen in only one place in ROOT I/O—a new class tag—but the Channels
had at that point already been read in as nullptr
(legally, too). So that was the first time I've encountered a byte pattern like that. The fact that it happened by mistake, some unintended confusion of streamers, explains a lot.
These complications wouldn't be a show-stopper, but the fact that different batches of your data are serialized differently has the potential of becoming a deep rabbit hole. We might get it working on one case, then keep encountering others until we give up later. (Better to give up early! :)
Since you only want some numbers and a mapping from numbers to numbers in config/detector, maybe you could introduce HDF5 files for just the geometry and bring over the measurement data later. Since this object is describing geometry, it's more metadata than data. (Does it even change? If there's only a few thousand values for your entire dataset—heck, it could be JSON. CMS "good run/lumi section" lists are in JSON.)
If you do use HDF5 to describe the map<long, int>
, consider using a sparse array. The COO representation in one dimension (vector, not matrix) is just a sorted (or otherwise indexed) array of channel IDs (long
) and a corresponding array of the values they map to (int
). An attempt to access sparsearray[channel_id]
should do a log-N bisection search (or other smart index lookup) for the channel_id
, find it, and return the corresponding value. If the lookup value isn't really a channel, it would return 0
because it's sparse.
If don't know if this is what HDF5 does, exactly, but HDF5 is big on sparse arrays and it's one of my favorite hacks to reinterpret sparseness as integer lookup.
Good luck finding the next 1987A!
Ah well, you're right, it's not a show-stopper, the structures are still readable with pyROOT. We should be able to deal with it! Thanks for the help :) We will surely spread the word about uproot :+1:
I apologize for the intrusion but I'd be willing to give this a try for Go-HEP and Groot (the other library that reads ROOT files w/o ROOT.)
Would you mind sending me a link to the file?
Good luck, @sbinet! An uproot+Groot (or just Groot) based workflow also solves the problem of installation for new users. I don't know how well Python and Go mix (if the final workflow doesn't use pure Go), but there's probably a good bridge out there somewhere.
As a suggestion, if possible: try to get both versions of the file, with and without the I3Eval_t::ChannelContainer_t*
field. I hope it works out for you!
And now that I'm thinking about installation difficulty (@rcross2's cited reason against ROOT), note that ROOT can be installed through Conda now, too (in the conda-forge channel).
@jpivarski creating a CPython{2,3} (or PyPy) module from a Go package is relatively easy thanks to go-python/gopy (a SWIG-like code generator command for Go).
The generated Python extension module will only need libc
and ctypes
.
see:
gopy
doesn't cut it, there's still SWIG)but, to reiterate: I'd like to see whether groot performs. could you (@rcross2 ) send me a link to that file? (or a file that exhibits the same issue.) thanks.
I'm trying to read a ROOT file and I can't seem to access a custom class. I get an
Unimplemented streamer type: TStreamerSTL
error.I think it's having trouble with the
map<long,int>
element, and if it gets past that I'm sure it will have trouble with theset<long>
.I've tried everything I can really think of to access these elements, really digging into the source code, but I don't understand the ROOT I/O format well enough to get anywhere.
Is there any way I can access this? I'm trying to make a script that transcribes this entire file, but this is the last element that I need to port to using
uproot
to remove the dependency on having to install ROOT and our custom class libraries to read our data.Thank you very much for this package -- it has helped cut down ROOT file read times in python significantly!