matplo / pyjetty

some work on jets
3 stars 14 forks source link

Fix shift in tree entries when initialized with empty array #88

Closed sbysiak closed 2 years ago

sbysiak commented 2 years ago

There was a problem when not all branches are created at the same time – in such a situation a shift between corresponding values is introduced, see minimal working example below.

I see two issues:

  1. not sure what should be done about empty iterables of other types (like dict) – so far nothing
  2. this is low-code fix, but a special case in _fill_branch() is effecively created: if value is empty list then branch is created but not filled. Maybe one should split it into _create_branch() and _fill_branch()?
import numpy as np
import pandas as pd
import uproot
from pprint import pprint
from pyjetty.mputils import treewriter

def fill(tw, dummy_value, n_elem):
    tw.fill_branch('single', dummy_value)
    tw.fill_branch('array', [dummy_value+i for i in range(n_elem)])
    tw.fill_tree()

tw = treewriter.RTreeWriter(name ='t', file_name='test.root')

### SUCCESS:
# fill(tw, 100, 5)
# fill(tw, 10, 0)
# fill(tw, -1, 3)

### FAIL:
fill(tw, 10, 0)     # 'array' init with empty list – branch is not created
fill(tw, 100, 5)  
fill(tw, -1, 3)

tw.write_and_close()

print('file saved, check results:')
froot = uproot.open("test.root")
t = froot.get("tt")
arr = t.arrays(library="np")
try:
    df = pd.DataFrame(arr)
    pprint(df)
except ValueError as e:
    print('Error catched: ', e)
    pprint(arr)
    for k,v in arr.items():
        print(k, 'N=', len(v))

"FAIL" produces such tree, where array [100,101,...] is assigned to value 10 in "single" instead od 100

root [2] tt->Scan("single:array")
***********************************************
*    Row   * Instance *    single *     array *
***********************************************
*        0 *        0 *        10 *       100 *
*        0 *        1 *           *       101 *
*        0 *        2 *           *       102 *
*        0 *        3 *           *       103 *
*        0 *        4 *           *       104 *
*        1 *        0 *       100 *        -1 *
*        1 *        1 *           *         0 *
*        1 *        2 *           *         1 *
*        2 *        0 *        -1 *        -1 *
*        2 *        1 *           *         0 *
*        2 *        2 *           *         1 *
***********************************************
matplo commented 2 years ago

Great catch... fortunately the bug not affecting many (or any?)... For the empty dicts (or in fact all cases where the branch may be left uncreated) to introduce an exception throw... code should crash... leave it is up to the user to make sure the iterables are not empty... I will merge this for now. Thanks Sebastian.