scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.
https://uproot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
239 stars 76 forks source link

Random crash when reading mostly empty double nested branch #1244

Closed Superharz closed 4 months ago

Superharz commented 4 months ago

Python version: 3.11.9

Numpy version: 1.26.4

Uproot version: 5.3.9

Awkward version: 2.6.5

I am confronted with a new random crash when opening and creating arrays from a new PHYSLITE file. Random means, that the error does not always happen. Most of the time no error appears and the file is read properly but sometimes it does appear.

with uproot.open(input_file) as f:
----> 7     data_atlas = f[tree].arrays(branches_raw)

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\behaviors\TBranch.py:823](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/behaviors/TBranch.py#line=822), in HasBranches.arrays(self, expressions, cut, filter_name, filter_typename, filter_branch, aliases, language, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library, ak_add_doc, how)
    820                 ranges_or_baskets.append((branch, basket_num, range_or_basket))
    822 interp_options = {"ak_add_doc": ak_add_doc}
--> 823 _ranges_or_baskets_to_arrays(
    824     self,
    825     ranges_or_baskets,
    826     branchid_interpretation,
    827     entry_start,
    828     entry_stop,
    829     decompression_executor,
    830     interpretation_executor,
    831     library,
    832     arrays,
    833     False,
    834     interp_options,
    835 )
    837 # no longer needed; save memory
    838 del ranges_or_baskets

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\behaviors\TBranch.py:3105](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/behaviors/TBranch.py#line=3104), in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets, interp_options)
   3102     pass
   3104 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3105     uproot.source.futures.delayed_raise(*obj)
   3107 else:
   3108     raise AssertionError(obj)

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\source\futures.py:38](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/source/futures.py#line=37), in delayed_raise(exception_class, exception_value, traceback)
     34 def delayed_raise(exception_class, exception_value, traceback):
     35     """
     36     Raise an exception from a background thread on the main thread.
     37     """
---> 38     raise exception_value.with_traceback(traceback)

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\behaviors\TBranch.py:3074](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/behaviors/TBranch.py#line=3073), in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
   3071 basket = None
   3073 if len(basket_arrays) == branchid_num_baskets[branch.cache_key]:
-> 3074     arrays[branch.cache_key] = interpretation.final_array(
   3075         basket_arrays,
   3076         entry_start,
   3077         entry_stop,
   3078         branch.entry_offsets,
   3079         library,
   3080         branch,
   3081         interp_options,
   3082     )
   3083     # no longer needed, save memory
   3084     basket_arrays.clear()

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\interpretation\objects.py:489](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/interpretation/objects.py#line=488), in AsObjects.final_array(self, basket_arrays, entry_start, entry_stop, entry_offsets, library, branch, options)
    485 elif isinstance(library, uproot.interpretation.library.Awkward):
    487     if isinstance(to_append, numpy.ndarray):
    488         trimmed.append(
--> 489             uproot.interpretation.library._object_to_awkward_array(
    490                 uproot.extras.awkward(), self._form, to_append
    491             )
    492         )
    493     else:
    494         trimmed.append(to_append)

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\interpretation\library.py:473](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/interpretation/library.py#line=472), in _object_to_awkward_array(awkward, form, array)
    468 def _object_to_awkward_array(awkward, form, array):
    469     unlabeled = awkward.from_iter(
    470         (_object_to_awkward_json(form, x) for x in array),
    471         highlevel=False,
    472     )
--> 473     return awkward.Array(_awkward_json_to_array(awkward, form, unlabeled))

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\interpretation\library.py:442](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/interpretation/library.py#line=441), in _awkward_json_to_array(awkward, form, array)
    440     content = _awkward_json_to_array(awkward, form["content"], array)
    441 else:
--> 442     content = _awkward_json_to_array(
    443         awkward, form["content"], array.content
    444     )
    445 cls = uproot._util._content_cls_from_name(awkward, form["class"])
    446 return cls(offsets, content, parameters=form["parameters"])

File [~\miniforge3\envs\uni\Lib\site-packages\uproot\interpretation\library.py:411](http://localhost:8888/lab/tree/ztautau-polarization/src/panda/~/miniforge3/envs/uni/Lib/site-packages/uproot/interpretation/library.py#line=410), in _awkward_json_to_array(awkward, form, array)
    404         content = _awkward_json_to_array(
    405             awkward, form["content"], array.content
    406         )
    407         return type(array)(
    408             array.offsets, content, parameters=form["parameters"]
    409         )
--> 411 elif form["content"]["parameters"].get("__array__") == "sorted_map":
    412     offsets = _awkward_offsets(awkward, form, array)
    413     key_form = form["content"]["contents"][0]

KeyError: 'parameters'

I have further narrowed the error down on a branch that is double nested that is mostly empty (1410 out of 270000 events are not empty, the first non empty entry is at index 682) but has the shape 270000 * var * var * int32. When I try to only read this single branch it always crashes. When I exclude this branch it does not crash. If I read this branch together with other branches it sometimes crashes and sometimes not.

In the cases it does not crash the branch in question looks like this:

[[],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 ...,
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 []]
--------------------------------
type: 270000 * var * var * int32

The branch stores the decay products of taus (one inner list for each tau, as the process has almost no true taus the branch is empty for most events). If I only read in a different tau branch (e.g. the mass) that has one less dimension, so less nesting, I never encounter this crash. The branch looks like this:

[[],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 ...,
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 []]
----------------------------
type: 270000 * var * float32

How can I prevent this crash? The branch is mostly empty, but not completely, so it makes no sense to exclude it from reading.

jpivarski commented 4 months ago

Does #1245 fix it?

Superharz commented 4 months ago

I installed uproot from the branch of your commit and the error does not appear anymore. So I think it is solved by your commit.

Thank you for your quick fix!

Would it be possible to make a new release soon that contains this fix?

jpivarski commented 4 months ago

It's been released: GitHub and PyPI. Cheers!