Closed nsheff closed 12 months ago
I'm a bit conflicted here and would like to hear if anyone who has lots of experience with PEP/looper has any ideas... Opinions, @stolarczyk @vreuter @jpsmith5 @afrendeiro ?
I think I lean toward option 1 actually. I feel like it may be easier conceptually to have a separate config file outside of the PEP file.
The question
To run a project, we need 1) a pipeline, and 2) some samples to run the pipeline on. They're independent, since I could run a different pipeline on those samples, or run that pipeline on different samples. So, it seems nice to specify them independently.
But right now, looper sticks its fingers all over into PEP, which specifies the sample metadata. For example:
sample_modifiers
. So, you're configuring the pipeline run using the PEP.Because we are modifying the PEP to define and modify the pipeline, this couples the PEP to the particular pipeline. But wait -- a great thing about PEP is that these things are happening inside the yaml file, and not in the sample table. So, that's nice, yes ... That's great -- but it wouldn't it be even better if the entire PEP were portable? Maybe... but on the other hand, in some sense the whole point of the PEP was to move the non-portable stuff into the config file.
Possible solutions
One possibility is to have the looper config specify the PEP and the pipeline settings (interface/parameters), independently. So the looper config then points to two places, instead of one, and the pipeline settings are removed from the PEP.
Alternatively, this could be done using the PEP
import
project modifier. To make the config file also portable, you could just have two config files, one that imports the other. So, the "outer" config, that you pass to looper, wouldimport
the other one. All pipeline/analysis-specific settings would exist in the outer config. Then, the "inner" config (the portable one) would have only information pertaining to the samples.Advantages and disadvantages