Closed tamasgal closed 2 years ago
If this is related to the hack for performance in doubly-nested structures, that hack will need to give way to AwkwardForth at some point. Adding an AwkwardForth implementation to all types is @aryan26roy's May‒July 2022 project, so... soon!
I actually hadn't realized that you were still relying on that hack. It doesn't even work for all doubly-nested structures, just a particular combination of vectors and pointers.
Is it the case that 007b10014003a74d0e89e0166e21a6218de0d445 is preventing the code flow from entering the gateway into that hack, which is this block:
?
I've started downloading the file to check.
Thanks for the quick answer! Indeed, we are heavily relying on these doubly-nested structures, unfortunately. It's roughly 20% of your MC and reconstruction data ;)
Yes I can confirm that awkward_can_optimize
returns True
for uproot 4.1.2
but False
for uproot 4.1.3
.
Edit: messed up False
and True
;)
The backdoor in Awkward Array relies on the type matching exactly, and 007b10014003a74d0e89e0166e21a6218de0d445 changes the "uproot"
parameter by one key-value pair. I think that including the "inner_shape"
at that point was a mistake, so if @nsmith- actually needs it, I'll show him how to get it through the NumpyForm instead (and then merging this would require coordination with Coffea, so I hope it's simpler than that).
But this is a good reminder of how much of a difference this makes. It will be great to get this all converted to AwkwardForth, and then I won't have to keep calling it a "hack." (Maybe I could call it a "prototype that we didn't go with, in favor of AwkwardForth.")
Many thanks for the quick fix :)
Our users observed a significant performance regression (10x and more) and it turns out to be introduced somewhere between 4.1.2 and 4.1.3 (release on 29th September 2021). The
awkward
version is the latest1.8.0
.I went through the release notes in https://github.com/scikit-hep/uproot4/releases/tag/4.1.3 and I first reverted https://github.com/scikit-hep/uproot4/pull/441 which also needed the
objectarray1d
function fromuproot._util
(see https://github.com/scikit-hep/uproot4/pull/442/files) but it did not help.I also went through the other PRs but they seemed unrelated, so I started a
git bisect
withGood:
382bb62f47a05f6111d3787e3235f9e56d9e113e
(4.1.2) Bad:03cb08ccb1f47ce0b082117f41a112af1ab403ee
(4.1.3)and identified the https://github.com/scikit-hep/uproot4/commit/007b10014003a74d0e89e0166e21a6218de0d445 from @nsmith- which introduced the performance regression:
I investigated our codes and found the exact call which causes the problem: it's indeed the reading of doubly nested arrays.
With
uproot 4.1.2
, the following code is almost instantaneous whereas withuproot 4.1.3
it takes about 1.5 minutes (for a 1.5GB file):Here is the file http://131.188.161.12:30002/doubly_nested_regression.root
@jpivarski you probably remember that you introduced some special hack to squeeze a lot out of doubly-nested structures, see https://github.com/scikit-hep/awkward-1.0/issues/968 and https://github.com/scikit-hep/awkward-1.0/blob/main/src/awkward/_connect/_uproot.py#L35 so my first thought was that these became ineffective with the changes introduced in https://github.com/scikit-hep/uproot4/commit/007b10014003a74d0e89e0166e21a6218de0d445 but I am just speculating
; )