Closed xpzhang closed 1 year ago
Hi @xpzhang,
I am not sure if reading custom classes is supported by uproot, which is what we use for reading ROOT files. Would it be possible for you to convert the trees to flat float/integer branches? Otherwise, if you could share an example root file I can also try to investigate.
Best, Huilin
Hi @hqucms
Thanks for the reply! I attached a sample data. uproot has no problem to read simple custtom class in my case. It works fine with branches with names like hits.x
when they appear in the inputs section as shown in my configuration file. I suppose the problem might somehow related to function _get_variable_names
(ast.parse) in utils/data/tools.py failing to parse something like hits.x
as a whole.
fe_13.root.zip
Hi @xpzhang ,
Can you try this hack on the _get_variable_names
function, and using something like hits__DOT__x
in the yaml?
If this works then I will integrate it.
def _get_variable_names(expr,
exclude=['awkward', 'ak', 'np', 'numpy', 'math'],
escape_dict={'__DOT__': '.'}):
import ast
root = ast.parse(expr)
names = sorted({node.id for node in ast.walk(root) if isinstance(
node, ast.Name) and not node.id.startswith('_')} - set(exclude))
output = []
for n in names:
for src, tgt in escape_dict.items():
if src in n:
n = n.replace(src, tgt)
output.append(n)
return output
Hi @hqucms
The new function seems to work perfectly when tested alone. However I got another error caused by _eval_expr
as followed.
I cannot figure out what's went wrong here. Attached is the testing configuration file , hope it helps.
`Traceback (most recent call last):
File "/home/zhang/micromamba/envs/weaver/bin/weaver", line 8, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 214, in next
self.table, self.indices = self.prefetch.result()
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.get_result()
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result
raise self._exception
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 110, in _load_next
table, indices = _preprocess(table, data_config, options)
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 89, in _preprocess
table = _build_new_variables(table, data_config.var_funcs)
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/data/preprocess.py", line 25, in _build_new_variables
table[k] = _eval_expr(expr, table)
File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/data/tools.py", line 150, in _eval_expr
return eval(expr, tmp)
File "
Thanks a lot for the info, @xpzhang !
I managed to get it working with this https://github.com/hqucms/weaver-core/commit/9c4b72f864806ca21a2effb2967aacacb4da4b68, and made a new release v0.4.3.
Some changes are also needed to the yaml, namely adding
branch_magic:
__DOT__: .
and then replace hits.
with hits__DOT__
everywhere. Here is the updated one:
test.yaml.txt
Let me know if this works for you!
Hi @hqucms It works like a charm. Thanks for your efforts!
Hello Huilin,
In my ROOT files some features are stored in a custom class named
hits
. So variables belong to this class are named ashits.xxx
by ROOT automatically (see screenshot). This period symbol seems to cause problem when I want to filt events or define new variables from them. Is there any option to excape.
from parsing and pass the name "as it is"?Here are part of my data configuration file and the error message.
[2023-06-07 06:37:20,319] ERROR: When reading file ../data/fe_13.root: [2023-06-07 06:37:20,320] ERROR: Traceback (most recent call last): File "/Users/xpzhang/IHEPBox/Work/code/ml/ParNet_tuto/weaver/utils/data/fileio.py", line 76, in _read_files a = _read_root(filepath, branches, load_range=load_range, treename=kwargs.get('treename', None)) File "/Users/xpzhang/IHEPBox/Work/code/ml/ParNet_tuto/weaver/utils/data/fileio.py", line 45, in _read_root outputs = tree.arrays(branches, namedecode='utf-8', entrystart=start, entrystop=stop) File "/Users/xpzhang/mambaforge/envs/weaver/lib/python3.7/site-packages/uproot3/tree.py", line 537, in arrays branches = list(self._normalize_branches(branches, awkward0)) File "/Users/xpzhang/mambaforge/envs/weaver/lib/python3.7/site-packages/uproot3/tree.py", line 895, in _normalize_branches raise ValueError("cannot interpret branch {0} as a Python type\n in file: {1}".format(repr(branch.name), self._context.sourcepath)) ValueError: cannot interpret branch b'hits' as a Python type in file: ../data/fe_13.root