scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
238 stars 77 forks source link

Multiple cycle numbers break single TTree detection #494

Closed masonproffitt closed 3 years ago

masonproffitt commented 3 years ago

Functions like uproot.lazy normally work without specifying the tree name if there is only one TTree in the file, but this doesn't work if there are multiple cycle numbers:

>>> import uproot
>>> uproot.__version__
'4.1.7'
>>> uproot.lazy('root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/iris-hep/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 551, in lazy
    obj = _regularize_object_path(
  File "/home/user/miniconda3/envs/iris-hep/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2982, in _regularize_object_path
    raise ValueError(
ValueError: TTree object paths must be specified in the 'files' as {"filenames*.root": "path"} if any files have more than one TTree

    TTrees: 'Events;75', 'Events;74'

in file root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root

This is confusing since you otherwise never have to worry about cycle numbers. For example: uproot.open('root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root:Events') doesn't require a cycle number.

jpivarski commented 3 years ago

Should it just assume the highest one? Or do so with a default option?

masonproffitt commented 3 years ago

I would think it should do whatever uproot.open('file.root').key('Events') does if there are multiple cycle numbers present for Events, which seems to be taking the highest one:

>>> uproot.open('root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root').key('Events')
<ReadOnlyKey Events;75: TTree (seek pos 2244422054) at 0x7f6d397692e0>
jpivarski commented 3 years ago

The actual selection of TTree follows the same code path in lazy and non-lazy cases; what's different is the preparation of key names to ask for. So the substantial part of PR #500 (!) is

@@ -2965,7 +2965,7 @@ def _regularize_object_path(
             **options  # NOTE: a comma after **options breaks Python 2
         ).root_directory
         if object_path is None:
-            trees = [k for k, v in file.classnames().items() if v == "TTree"]
+            trees = file.keys(filter_classname="TTree", cycle=False)
             if len(trees) == 0:
                 if allow_missing:
                     return None

I also had to fix ReadOnlyFile.keys (and values, items, classnames) to not return duplicates when the disambiguating cycle number is suppressed. Actually, that part could be considered a bug, but no one ever brought it up before. (Mapping.keys should return unique strings...)