Closed SamKelson closed 3 months ago
@SamKelson This is an ATLAS open data file, right? Is this one that you downloaded from the CERN open data portal and so have the metadata for available easily? Or was this just a file that you copied from what's been replicated to the UChicago AF (we can still determine the file from the metadata and link to the original open data file on the CERN data portal)?
This file was linked to me by @ekourlit so any questions regarding the file itself will have to be directed to him.
The issue can be reproduced consistently by selecting one TBranch with uproot.AsGrouped
interpretation and attempting to read that with TTree.arrays
or TTree.iterate
:
tree = uproot.open("uproot-issue-1267.root")["CollectionTree"]
tree.arrays(["AnalysisPhotonsAuxDyn.ambiguityLink"])
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/jpivarski/irishep/uproot5/src/uproot/behaviors/TBranch.py", line 823, in arrays
_ranges_or_baskets_to_arrays(
File "/Users/jpivarski/irishep/uproot5/src/uproot/behaviors/TBranch.py", line 3165, in _ranges_or_baskets_to_arrays
uproot.source.futures.delayed_raise(*obj)
File "/Users/jpivarski/irishep/uproot5/src/uproot/source/futures.py", line 38, in delayed_raise
raise exception_value.with_traceback(traceback)
File "/Users/jpivarski/irishep/uproot5/src/uproot/behaviors/TBranch.py", line 3114, in basket_to_array
basket_arrays[basket.basket_num] = interpretation.basket_array(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jpivarski/irishep/uproot5/src/uproot/interpretation/grouped.py", line 119, in basket_array
raise ValueError(
ValueError: grouping branches like 'AnalysisPhotonsAuxDyn.ambiguityLink' should not be read directly; instead read the subbranches:
'AnalysisPhotonsAuxDyn.ambiguityLink.m_persKey', 'AnalysisPhotonsAuxDyn.ambiguityLink.m_persIndex'
in file uproot-issue-1267.root
in object /CollectionTree;1:AnalysisPhotonsAuxDyn.ambiguityLink
However, it had been working in TBranch.array
because a correct handling of uproot.AsGrouped
had been implemented there, but not applied to TTree.arrays
and TTree.iterate
.
PR #1269 should fix it.
(Your file is unusual in that TBranches with subbranches, which have uproot.AsGrouped
interpretation, usually don't have any data. Usually, a TBranch either has subbranches or has data, but not both. It looks like that data in your case consists of counts of the numbers of items per event, which is redundant because the subbranches also have this information.
The reason that the error message depended on the order in which you attempted to read data was because (1) we are not supposed to read data from these uproot.AsGrouped
branches, and it was an Uproot bug because we were, (2) there's an event loop that waits for data to be returned by a physical source, such as disk or the network, and these uproot.AsGrouped
branches aren't counted in the number of arrays to read because we're not supposed to read them, and (3) the physical source can supply the TBaskets in any order. In some cases, the uproot.AsGrouped
TBaskets were read earlier and hit the error message upon interpretation, and in other cases, they were read later and we got out of the event loop before reaching the error.
Even though this looks like a user-facing error message, telling you what to do, it can only arise out of inconsistency, an inconsistency that was just fixed with the new PR. It was probably written that way before we decided to silently fix uproot.AsGrouped
reading rather than complain to users about it.)
Let me know if the PR fixes your issue!
Let me know if the PR fixes your issue!
LGTM. Thanks @jpivarski!
$ docker run --rm -ti python:3.12 /bin/bash
root@f58ac9ea5722:/# python -m venv venv && . venv/bin/activate
(venv) root@f58ac9ea5722:/# python -m pip --quiet install uv
(venv) root@f58ac9ea5722:/# uv pip install 'git+https://github.com/scikit-hep/uproot5.git@jpivarski/dont-read-AsGrouped-TBranches'
Updated https://github.com/scikit-hep/uproot5.git (fce7e94)
Resolved 7 packages in 2.49s
Built uproot @ git+https://github.com/scikit-hep/uproot5.git@fce7e9455e44c0f125a366ca768bcc306f0d5fda
Prepared 6 packages in 5.50s
Installed 7 packages in 22ms
+ awkward==2.6.7
+ awkward-cpp==37
+ cramjam==2.8.3
+ fsspec==2024.6.1
+ numpy==2.0.1
+ packaging==24.1
+ uproot==5.3.11.dev5+gfce7e94 (from git+https://github.com/scikit-hep/uproot5.git@fce7e9455e44c0f125a366ca768bcc306f0d5fda)
(venv) root@f58ac9ea5722:/# curl -sLO https://cernbox.cern.ch/remote.php/dav/public-files/wJGWzAyirlWE6QV/DAOD_PHYSLITE.37233417._000052.pool.root.1
(venv) root@f58ac9ea5722:/# ls -lh DAOD_PHYSLITE.37233417._000052.pool.root.1
-rw-r--r-- 1 root root 1.1G Aug 12 22:05 DAOD_PHYSLITE.37233417._000052.pool.root.1
(venv) root@f58ac9ea5722:/# python -i
Python 3.12.5 (main, Aug 7 2024, 19:13:43) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> tree = uproot.open("DAOD_PHYSLITE.37233417._000052.pool.root.1")["CollectionTree"]
>>> tree.arrays(["AnalysisPhotonsAuxDyn.ambiguityLink"])
<Array [{...}, {...}, {...}, ..., {...}, {...}] type='90000 * {"AnalysisPho...'>
>>> tree
<TTree 'CollectionTree' (1343 branches) at 0x76f2647933b0>
>>>
Describe the bug: When attempting to access branches of a TTree in a ROOT file using uproot, an error occurs if grouped branches are included in a set with other branches. Specifically, the
arrays
method throws an error about grouped branches when attempting to access them alongside other branches. However, the error thrown is inconsistent as there are certain combinations of branches with the grouped branch that do not throw an error.To Reproduce:
CollectionTree
TTree:arrays
method of TTree to access the following keys provided viaset
`s:print(file.arrays(tracking_keys)) print(file.arrays(electron_keys)) print(file.arrays(combined_keys))
print(file.arrays(['AnalysisElectronsAuxDyn.TruthLink']))
print(file.arrays(combined_keys))