hqucms / weaver-core

Streamlined neural network training.
MIT License
44 stars 54 forks source link

dot symbol in ROOT variable names #5

Closed xpzhang closed 1 year ago

xpzhang commented 1 year ago

Hello Huilin,

In my ROOT files some features are stored in a custom class named hits. So variables belong to this class are named as hits.xxx by ROOT automatically (see screenshot). This period symbol seems to cause problem when I want to filt events or define new variables from them. Is there any option to excape . from parsing and pass the name "as it is"?

Here are part of my data configuration file and the error message.

WX20230607-072313@2x

[2023-06-07 06:37:20,319] ERROR: When reading file ../data/fe_13.root: [2023-06-07 06:37:20,320] ERROR: Traceback (most recent call last): File "/Users/xpzhang/IHEPBox/Work/code/ml/ParNet_tuto/weaver/utils/data/fileio.py", line 76, in _read_files a = _read_root(filepath, branches, load_range=load_range, treename=kwargs.get('treename', None)) File "/Users/xpzhang/IHEPBox/Work/code/ml/ParNet_tuto/weaver/utils/data/fileio.py", line 45, in _read_root outputs = tree.arrays(branches, namedecode='utf-8', entrystart=start, entrystop=stop) File "/Users/xpzhang/mambaforge/envs/weaver/lib/python3.7/site-packages/uproot3/tree.py", line 537, in arrays branches = list(self._normalize_branches(branches, awkward0)) File "/Users/xpzhang/mambaforge/envs/weaver/lib/python3.7/site-packages/uproot3/tree.py", line 895, in _normalize_branches raise ValueError("cannot interpret branch {0} as a Python type\n in file: {1}".format(repr(branch.name), self._context.sourcepath)) ValueError: cannot interpret branch b'hits' as a Python type in file: ../data/fe_13.root

WX20230607-064706@2x

hqucms commented 1 year ago

Hi @xpzhang,

I am not sure if reading custom classes is supported by uproot, which is what we use for reading ROOT files. Would it be possible for you to convert the trees to flat float/integer branches? Otherwise, if you could share an example root file I can also try to investigate.

Best, Huilin

xpzhang commented 1 year ago

Hi @hqucms Thanks for the reply! I attached a sample data. uproot has no problem to read simple custtom class in my case. It works fine with branches with names like hits.x when they appear in the inputs section as shown in my configuration file. I suppose the problem might somehow related to function _get_variable_names (ast.parse) in utils/data/tools.py failing to parse something like hits.x as a whole. fe_13.root.zip

hqucms commented 1 year ago

Hi @xpzhang ,

Can you try this hack on the _get_variable_names function, and using something like hits__DOT__x in the yaml? If this works then I will integrate it.

def _get_variable_names(expr,
                        exclude=['awkward', 'ak', 'np', 'numpy', 'math'],
                        escape_dict={'__DOT__': '.'}):
    import ast
    root = ast.parse(expr)
    names = sorted({node.id for node in ast.walk(root) if isinstance(
        node, ast.Name) and not node.id.startswith('_')} - set(exclude))
    output = []
    for n in names:
        for src, tgt in escape_dict.items():
            if src in n:
                n = n.replace(src, tgt)
        output.append(n)
    return output
xpzhang commented 1 year ago

Hi @hqucms The new function seems to work perfectly when tested alone. However I got another error caused by _eval_expr as followed. I cannot figure out what's went wrong here. Attached is the testing configuration file , hope it helps.

test.yaml.txt

train_test.sh.txt

`Traceback (most recent call last): File "/home/zhang/micromamba/envs/weaver/bin/weaver", line 8, in sys.exit(main()) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/train.py", line 904, in main _main(args) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/train.py", line 763, in _main train(model, loss_func, opt, scheduler, train_loader, dev, epoch, File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/nn/tools.py", line 45, in trainclassification for X, y, in tq: File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise raise exception NameError: Caught NameError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 196, in next i = self.indices[self.cursor] IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch data.append(next(self.dataset_iter)) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 214, in next self.table, self.indices = self.prefetch.result() File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/home/zhang/micromamba/envs/weaver/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 110, in _load_next table, indices = _preprocess(table, data_config, options) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/dataset.py", line 89, in _preprocess table = _build_new_variables(table, data_config.var_funcs) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/data/preprocess.py", line 25, in _build_new_variables table[k] = _eval_expr(expr, table) File "/home/zhang/micromamba/envs/weaver/lib/python3.10/site-packages/weaver/utils/data/tools.py", line 150, in _eval_expr return eval(expr, tmp) File "", line 1, in NameError: name 'hitsDOTu' is not defined`

hqucms commented 1 year ago

Thanks a lot for the info, @xpzhang !

I managed to get it working with this https://github.com/hqucms/weaver-core/commit/9c4b72f864806ca21a2effb2967aacacb4da4b68, and made a new release v0.4.3.

Some changes are also needed to the yaml, namely adding

branch_magic:
   __DOT__: .

and then replace hits. with hits__DOT__ everywhere. Here is the updated one: test.yaml.txt

Let me know if this works for you!

xpzhang commented 1 year ago

Hi @hqucms It works like a charm. Thanks for your efforts!