pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

multiple bugs in writing sample yamls fails with custom path #329

Closed nsheff closed 1 year ago

nsheff commented 3 years ago

If specifying a custom path like

pre_submit:
  python_functions:
  - looper.write_sample_yaml
var_templates:
  sample_yaml_path: "{looper.output_dir}/{sample.sample_name}.yaml"

it actually uses the first sample name to name to the file for subsequent samples... so it overwrites every sample yaml to pipeline_results/sample1.yaml

There's a number of bugs in the handling of custom yaml paths.

nsheff commented 3 years ago

Waiting for #328 to fix since I can no longer see debug output

nsheff commented 3 years ago

Ok I figured out what's going on here. There's leakage from one sample to the next; so if a given sample alters any variables in, for example, the pipeline namespace, then these changes are carried over to the next sample. So if, for example, we populate the variables in the plugin, then only the already-populated strings are carried over.

I think each sample needs to have its own copy of the namespaces to prevent this kind of leakage.

nsheff commented 3 years ago

Here is the culprit:

https://github.com/pepkit/looper/blob/cf63a987c289252340bc10e03ec5c9ce0b52d8f2/looper/conductor.py#L553

we render the pipeline interface var templates on the primary piface object, rather than on a copy of it

nsheff commented 3 years ago

@stolarczyk should we change the way this works to not self update but instead use a different way so we can keep the original templates?

https://github.com/pepkit/looper/blob/cf63a987c289252340bc10e03ec5c9ce0b52d8f2/looper/pipeline_interface.py#L62-L74

stolarczyk commented 3 years ago

I think each sample needs to have its own copy of the namespaces to prevent this kind of leakage.

Yes, agreed. I don't think the leakage was an intentional design decision.