scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
238 stars 77 forks source link

`uproot.dask` raises error on empty branches #697

Closed masonproffitt closed 2 years ago

masonproffitt commented 2 years ago

See example below. The exact file used is https://github.com/iris-hep/func_adl_uproot/blob/61b35593b4512dfaf3a75054aa13ffe2fbc602b6/tests/empty_branches_tree_file.root. It's a perfectly valid tree; there are just no entries, so I expect to get empty arrays rather than an error.

>>> import uproot
>>> uproot.__version__
'5.0.0rc2'
>>> uproot.dask('tests/empty_branches_tree_file.root')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/func_adl_uproot_rc/lib/python3.10/site-packages/uproot/_dask.py", line 143, in dask
    return _get_dak_array(
  File "/home/user/miniconda3/envs/func_adl_uproot_rc/lib/python3.10/site-packages/uproot/_dask.py", line 483, in _get_dak_array
    ].basket_entry_start_stop(0)
  File "/home/user/miniconda3/envs/func_adl_uproot_rc/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2112, in basket_entry_start_stop
    raise IndexError(
IndexError: branch 'int_branch' has 0 baskets; cannot get starting entry for basket 0
in file tests/empty_branches_tree_file.root
jpivarski commented 2 years ago

That's right, and also this is handled correctly by concatenate, arrays and array:

>>> uproot.concatenate('empty_branches_tree_file.root')
<Array [] type='0 * {int_branch: int32, float_branch: float32}'>
>>> uproot.open('empty_branches_tree_file.root:tree').arrays()
<Array [] type='0 * {int_branch: int32, float_branch: float32}'>
>>> uproot.open('empty_branches_tree_file.root:tree/int_branch').array()
<Array [] type='0 * int32'>
>>> next(uproot.open('empty_branches_tree_file.root:tree').iterate())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So it's something in the uproot.dask logic, and @kkothari2001 should take a look. I'll put this file in scikit-hep-testdata so that we can add it to the tests.

Thanks!

jpivarski commented 2 years ago

It's ready: scikit-hep-testdata 0.4.20 has uproot-issue-697.root, which should be used in a test to ensure that uproot.dask can make an empty Dask array from it. (Maybe that array will need to have one partition, but that partition can be an empty array.)

kkothari2001 commented 2 years ago

Hey @masonproffitt , thanks for finding the bug! I'm solving it in #700.

@jpivarski please confirm the behaviour:

(PS: Sorry, I didn't mean to be nitpicky with the PR title, I accidentally thought this issue was the PR I created because of the similar titles and intended to correct my own mistake).

jpivarski commented 2 years ago

What you describe sounds like a good behavior to adopt. For any valid ROOT files, the user will get an array out, though it might be empty.

No problem changing the issue title name. I will sometimes change an issue title name as I find out more about it (if, for instance, the original title described a symptom and we later discover the cause). The issue titles are our list of to-do items, and we should feel free to make them more understandable to ourselves so that it's easier to find actionable tasks.