Closed dferlewicz closed 3 years ago
Thanks, I'll take a look after lunch. It's possible that something in the variable manager changed between releases, so I have to check that. For gbasf2 tasks, b2luigi extracts all aliases from the variable manager instance via the function
from variables import variables as vm
def get_alias_dict_from_variable_manager():
"""
Extracts a dictionary with the alias names as keys and their values from the
internal state of the variable manager and returns it.
"""
alias_dictionary = {alias_name: vm.getVariable(alias_name).name for alias_name in list(vm.getAliasNames())}
return alias_dictionary
It should return a dictionary of all variable aliases and variables that had been added with add_alias
. I'll try this function manually and see if this is a general syntax change, maybe an issue with specific variables or maybe even a bug in basf2.
I looked into this and as I see this, it' a bug in the basf2 release light-2106-rhea
, since I get a memory error when running the following minimal script, which has nothing to do with b2luigi:
#!/usr/bin/env python3
from variables import variables as vm
# add some random aliases
vm.addAlias("testalias1", "M")
vm.addAlias("testalias2", "daughter(1, M)")
vm.addAlias("testalias3", "daughter(1, px)")
aliases = list(vm.getAliasNames())
print("All aliases except first:", aliases[1:]) # this works
print("First alias", aliases[0]) # this gives memory error
I therefore opened the basf2 issue BII-8572. I'll close the issue here, since I think there's not much I can do about it without rewriting large parts of the gbasf2 part of b2luigi. If it turns out that it's an error in how we use the variable manager, I'll reopen this issue.
Context to understand the issue: The problem is that b2luigi doesn't send your steering file to the grid, instead it save the basf2 path to disk (as a pickle file) and send this pickled basf2 path. The variable aliases are not stored in the basf2 path, so we have to extract them from basf2 and send them as a separate object. However, the latest basf2 light release seems to have an error in the function to get the aliases.
If you really need this light release, the only tip that I can give is not use basf2 aliases. You can still emulate basf2 aliases via python dictionaries, e.g.:
aliases = {
"d1_M": "daughter(1, M)",
"d1_p": "daughter(1, p)",
…
}
ma.variablesToNtuple(particle_list, variables=[aliases["d1_M"], aliases["d1_p"]], …)
Personally, in my offline reconstruction I just use the full variable names. I just rename the columns after I load the ntuples into a pandas dataframe. For that I defined the function make_root_incompatible to make the variable names readable again (there is also the built-in ROOT.Belle2.invertMakeRootCompatible
, but there are sometimes problems with using that from python). I can just use it by
df.rename(columns=make_root_incompatible, inplace=True)
In a similar way you could also apply aliases offline on dataframe level by using a dictionary of variables to aliases to rename the columns:
df.rename(columns=variable_to_alias_dict, inplace=True)
Re-opening this since I got some replies on issue BII-8572 and this indeed an error caused by the new basf2 externals, in particular the new root version. However, I got some replies on how one can work around that bug, so I'll re-open for a moment until the PR that fixes this issue is merged (PR should be linked below)
Thanks for the quick turnaround! I switched to using the development version and can confirm it works (would it be worth making this a patch for people to upgrade via pip since this will be an issue for many?)
No problem, I bumped the release version to v0.6.7 and published it on PyPi, so you can upgrade now simply with
python3 -m pip install [--user] --upgrade b2luigi
and check that you have the correct version with b2luigi.__version__
.
Note that you can also install the development version via pip by using
python3 -m pip install [--user] --upgrade "git+https://github.com/nils-braun/b2luigi"
which I know some people are already doing, because in the past the latest release was often out of date, at least for the gbasf2 batch. But recently I started publishing new releases more frequently so you can stick with those :)
A new version of gbasf2 for Belle2, v5r1p1, has been released, allowing for compatibility with the new releases of basf2 (light-2106-rhea) that are run on a new externals package. However, this caused aliases to no longer be set correctly on any gbasf2 steering tasks. I have attached a minimal (not-)working example to recreate this issue on the new gbasf2 release, where the regular lsf task will run, but the gbasf2 task will not run unless lines 19 and 20 are commented out.
Excerpt of error: File "/home/belle/dfer/.local/lib/python3.8/site-packages/b2luigi/batch/processes/gbasf2_utils/pickle_utils.py", line 13, in
alias_dictionary = {alias_name: vm.getVariable(alias_name).name for alias_name in list(vm.getAliasNames())}
cppyy.ll.SegmentationViolation: const Belle2::Variable::Manager::Var* Belle2::Variable::Manager::getVariable(string name) =>
SegmentationViolation: segfault in C++; program state was reset
basic_pipeline.txt