Closed tamasgal closed 4 years ago
My understanding of your problem is that (a) you expect multidimensional arrays for f['E']['Evt']['mc_trks']['mc_trks.usr']
in each event but get a one-dimensional array in each event and (b) the lengths of these arrays are too long.
On (a), I would expect a one-dimensional array in each event. The Evt
struct contains a single std::vector<Trk> trks
and that Trk
is flat: just an int
, a Vec
(which gets split into .x
, .y
, .z
branches) and a double
. I only count one dimension there—maybe I've misunderstood you?
On (b), I can see the problem: the extra values are physically in the ROOT file, but the length of your std::vector
is specified to be shorter than what has been written. You can see it with the uproot.asdebug
interpretation:
>>> f['E']['Evt']['mc_trks']['mc_trks.usr'].array(uproot.asdebug)[0]
array([ 64, 0, 0, 126, 0, 9, 0, 0, 0, 4, 63, 168, 238,
40, 103, 39, 86, 134, 63, 174, 33, 16, 27, 0, 54, 135,
64, 8, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 64, 79, 158, 226, 235, 28,
67, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=uint8)
The first 6 bytes (64, 0, 0, 126, 0, 9
) are STL header, the next 4 bytes (0, 0, 0, 4
) is the length of the array, and the rest is both real data and not-real data. The interpretation for this branch,
>>> f['E']['Evt']['mc_trks']['mc_trks.usr'].interpretation
asjagged(asdtype('>f8'), 10)
just skips over the first 10 bytes, expecting the rest to be meaningful data. In all the cases I've seen up to this point, it has been. In your case, there's junk padding after the meaningful data.
So ROOT has more degrees of freedom than previously thought and we have to actually check the 4 byte size in every STL vector jagged array. I'm going to see about adding a parameter for that.
Sorry for the weird issue title, we can change it later, I could not find a better one 😉
I need some help parsing a fairly simple data structure. It derives from a class which has:
These two vectors are used to store arbitrary data, so that e.g.
usr_names = ["bx", "by", "ichan", "cc"]
and the corresponding values are at the specific indices. So if you need to look up the value for"by"
, you look up the index of it and then accessusr[idx]
.So far so good, it works for "one dimensional" (flat) branches. Here, the event of class
Evt
derives from thatAAObject
and the the branchEvt
simply contains a flat vector ofEvt
instances (3 of them) and each event contains 17 "usr entries":The problem appears with classes which have instances in nested branches. This means that for example the
Trk
class, which also derives fromAAObject
and is part of theEvt
branch. EachEvt
entry has a variable length ofTrks
, as seen here (just the relevant parts):The
Trk
class itself is also quite straight forward and consists of some attributes:The file
usr-nested
contains a few events and every event contains multipleTrk
instances. Only the first twoTrk
entries should have some entries in theusr*
attributes, the first oneby
,bx
,ichan
andcc
, the second one onlyenergy_lost_in_can
. The structure of the arrays however I get back fromuproot
are all one dimensional per event. It seems that only the first entry is extracted and also the length of the arrays seems a bit "random".This is what I get:
What I expect, is that the following line returns a nested list, where the first nested list has a length of 4, the second 1 and all others are empty. I however get 15 single entries and the first 4 entries correspond to the first track (and the values are OK):
The array elements apart from the first 4 values seem to be random memory bits.
I tried to figure out why the data is parsed incorrectly but I failed so far. I also did not found the word
energy_lost_in_can
in theusr_fields
(converted to string etc.), so I guess it is lost somewhere in the low level parsing inuproot
.This is what the ROOT based library spits out for the first event (all 21 tracks, the first one with 4 usr-entries, the second with 1 and every other with no entries):
Do you have any idea, or is this a known issue with nested vectors?
I attached both files in case you want to have a look.
usr.zip