icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 47 forks source link

Bug when applying cuts in `events_pi.py` for certain variable names #699

Closed kayla-leonard closed 2 weeks ago

kayla-leonard commented 2 years ago

This chunk of code does not work if one variable_name is a subset of another variable_name:

https://github.com/icecube/pisa/blob/master/pisa/core/events_pi.py#L553-L556

for variable_name in variables:
    crit_str = crit_str.replace(
        variable_name, 'self["%s"]["%s"]' % (key, variable_name)
    )

For example: when trying to load both reco_z and reco_zenith, I end up with expressions like self["nue_cc"]["reco_z"]enith.

kayla-leonard commented 5 months ago

Code to verify behavior:

key = "nue_cc"

crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
    crit_str = crit_str.replace(
        variable_name, 'self["%s"]["%s"]' % (key, variable_name)
    )

print(crit_str)

currently gives:

(self["nue_cc"]["reco_z"]enith >= 0) & (self["nue_cc"]["reco_z"] >= 500)

but the desired behavior is:

(self["nue_cc"]["reco_zenith"] >= 0) & (self["nue_cc"]["reco_z"] >= 500)
JKrishnamoorthi commented 3 months ago
import re

key = "nue_cc"
crit_str = "(reco_zenith >= 0) & (reco_z >= 500)"
for variable_name in ['reco_z','reco_zenith']:
    # Using word boundary \b to replace whole words only
    crit_str = re.sub(r'\b{}\b'.format(variable_name), 'self["%s"]["%s"]' % (key, variable_name), crit_str)

print(crit_str)

A re module with a word boundary expression will solve the issue. If it looks ok, I can make a pull request.

thehrh commented 3 weeks ago

Seems like a reasonable low-effort fix to make.