legend-exp / legend-dataflow

LEGEND data flow management
Other
2 stars 9 forks source link

Apptainer-specific code in dataflow does not translate to NERSC Shifter #49

Closed slwatkins closed 1 day ago

slwatkins commented 2 weeks ago

I've been testing production on NERSC, which uses shifter for containers, rather than apptainer. There's some specific instances where the apptainer-specific code will cause occasional errors with NERSC-based production.

For example, in the utils.py script, there is a prepended environmental variable called APPTAINERENV_PREPEND_PATH that shifter will ignore, see below. https://github.com/legend-exp/legend-dataflow/blob/c3e66b310eea9ecca4028d08c5104b47e3f18f7d/scripts/util/utils.py#L202-L206

One case in which this will fail the data production on NERSC, but not with apptainer, is when the pet files get concatenated. https://github.com/legend-exp/legend-dataflow/blob/c3e66b310eea9ecca4028d08c5104b47e3f18f7d/rules/evt.smk#L85-L88 Here, the lh5concat command is assumed to be on the path, but won't be in the case of using shifter on NERSC

So - I wonder if there is a more general way to update the path that isn't specific to apptainer, but would work for both cases? The "hotfix" that I implemented to get around this was to change the rule to the following shell command

 shell: 
     "{swenv} "
     "{basedir}/../../software/python/install/bin/lh5concat --verbose --overwrite " 
     "--output {output} " 
     "-- {input} &> {log}" 

But, I'm sure there is a cleaner way in this scenario.

gipert commented 4 days ago

Addressed in #53