Closed ivukotic closed 10 months ago
The first thing I noticed is that this TBranch includes an ElementLink
data type.
>>> tree["PrimaryVerticesAuxDyn.trackParticleLinks"].typename
'std::vector<std::vector<ElementLink<DataVector<xAOD::TrackParticle_v1>>>>'
I wonder if it's related to #951.
The error message is wrong: if it is the case that we can't deserialize it, it should be a DeserializationError. This error comes from thinking the context["forth"].gen
is still active when it's not. I'll check that.
Meanwhile, we can circumvent Uproot's attempt to use AwkwardForth by loading it with library="np"
.
>>> tree["PrimaryVerticesAuxDyn.trackParticleLinks"].array(library="np")
array([<STLVector [[]] at 0x751ef3eb3910>,
<STLVector [[]] at 0x751ef3eb38b0>,
<STLVector [[]] at 0x751ef3eb3130>, ...,
<STLVector [[]] at 0x751ef3569ed0>,
<STLVector [[]] at 0x751ef3569f90>,
<STLVector [[]] at 0x751ef356a050>], dtype=object)
It worked! Okay, so this is a TBranch that we can deserialize, but possibly not with AwkwardForth (and that line with context["forth"].gen
is insufficiently guarded).
I observed the same error with Physlite files.
In my case, it happens with parentLinks
and childLinks
in the truth records.
For some information (e.g. taus) both branches can be read.
For other particles (e.g. muons) the childLinks
always fail.
The most interesting are the parentLinks
. I observed that if I try to read this branch multiple times it most of the times results in the AttributeError: 'NoneType' object has no attribute 'reset_active_node'
error but sometimes it does not throw an error and then the parentLinks
are filled correctly.
I don't need the links at the moment. So my workaround is simply to not read these branches.
It fails in cases in which whole TBaskets consist of empty lists. The AwkwardForth-discovery process is in a state in which the Forth code hasn't been generated yet because it hasn't seen a full example datum, but it hasn't given up yet because it might still find a full datum. This was tested in our small (mostly single-TBucket) test files, but the cases you've seen, @ivukotic and @Superharz, are in this state when transitioning from one TBasket to the next. The indicator of this state is when context["forth"].vm
doesn't exist at startup or is equal to None after a TBasket, and the latter state wasn't correctly checked.
But, fortunately, the data are readable. It's not related to #951.
Thank you for the explanation. But why does it sometimes succeeds with reading a file and most of the times not while reading the exact same file? I should mention that I test this in a Jupyter Notebook by simply re-runnig the cell to read the branch until it does not throw an error. So maybe it could be some IPython stuff.
That... does not make sense. Unless maybe you're using an interpretation_executor
to read the TBaskets with threads, in which case, there could be a race condition? If read sequentially without threads (the default), this ought to be deterministic.
This is all I am doing right now:
with uproot.open(path + filename) as f:
f[tree]["TruthBottomAuxDyn.parentLinks"].array()
It sometimes works, but most of the times does not work. This behavior also stays the same if I restart the Python kernel after each try.
Well, I can't think of anything non-deterministic in this process, but it is a complex process.
The variable in question is an attribute of a thread-local variable so that it behaves properly if multithreading is involved (though most of the time, it's not). Maybe this isn't as deterministic as I think it is?
Actually, the TBaskets can arrive in any order—the server is not obliged to send them in file-order if it doesn't want to, and then Uproot would deal with them in the order they're received. That's a source of non-determinism. I don't think it can apply to local files, even though we're getting them through fsspec now.
But anyway, the point may be moot, since the fix has been merged into main
. If you pip install -e .
from Uproot in the main
branch, you shouldn't see the issue at all. Is that the case?
I tested it. First: Sometimes it works:
>>> with uproot.open(path + filename) as f:
... f[tree]["TruthBottomAuxDyn.childLinks"].array()
...
<Array [[], [], [], [], ..., [], [], [], []] type='40000 * var * var * stru...'>
But sometimes this non-deterministic error happens with the exact same input file. However, the error message is now different from before:
>>> with uproot.open(path + filename) as f:
... f[tree]["TruthBottomAuxDyn.childLinks"].array()
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\behaviors\TBranch.py", line 1815, in array
_ranges_or_baskets_to_arrays(
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\behaviors\TBranch.py", line 3142, in _ranges_or_baskets_to_arrays
uproot.source.futures.delayed_raise(*obj)
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\source\futures.py", line 38, in delayed_raise
raise exception_value.with_traceback(traceback)
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\behaviors\TBranch.py", line 3111, in basket_to_array
arrays[branch.cache_key] = interpretation.final_array(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\interpretation\objects.py", line 475, in final_array
output = numpy.concatenate(trimmed)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\highlevel.py", line 1527, in __array_function__
return ak._connect.numpy.array_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_connect\numpy.py", line 102, in array_function
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_connect\numpy.py", line 142, in ensure_valid_args
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_dispatch.py", line 62, in dispatch
next(gen_or_result)
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\operations\ak_concatenate.py", line 66, in concatenate
return _impl(arrays, axis, mergebool, highlevel, behavior, attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\operations\ak_concatenate.py", line 114, in _impl
content_or_others = ensure_same_backend(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\operations\ak_concatenate.py", line 116, in <genexpr>
ctx.unwrap(
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_layout.py", line 146, in unwrap
return to_layout_impl(
^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\operations\ak_to_layout.py", line 177, in _impl
promoted_layout = ak.operations.from_numpy(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_dispatch.py", line 39, in dispatch
gen_or_result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\operations\ak_from_numpy.py", line 55, in from_numpy
from_arraylib(array, regulararray, recordarray),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\awkward\_layout.py", line 347, in from_arraylib
raise TypeError("Awkward Array does not support arrays with object dtypes.")
TypeError: Awkward Array does not support arrays with object dtypes.
This error occurred while calling
ak.concatenate(
[<Array [[], [], [], [], ..., [], [], [], []] type='45 * var * var * ...
)
Open this as a new issue. I think the non-deterministic part might be something unrelated to the first issue. If you can provide an example file, that would help a lot.
I think it's going wrong here:
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\behaviors\TBranch.py", line 3111, in basket_to_array
arrays[branch.cache_key] = interpretation.final_array(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\flori\miniforge3\envs\test\Lib\site-packages\uproot\interpretation\objects.py", line 475, in final_array
output = numpy.concatenate(trimmed)
^^^^^^^^^^^^^^^^^^^^^^^^^^
In the subsequent output, it's trying to make an Awkward Array out of a NumPy array with dtype=object
, which isn't allowed. On the line above, I wonder if the list of arrays, trimmed
, accidentally has a mix of Awkward Arrays and NumPy dtype=object
arrays. (The latter need an additional step to be turned into Awkward Arrays.)
I opened the issue in #1101
there is a strange issue when trying to read a branch from an ATLAS physlite file. I use uproot 5.2.1 and awkward 2.5.2.
Here the smallest reproducible example:
this file is already cached so you should be able to access it without authentication.
this is the result: