Closed vivbak closed 1 week ago
We need to review all the existing Cohort stages to see if they are eligible (or more suited) to be a MultiCohort stage.
We need to review all the existing Cohort stages to see if they are eligible (or more suited) to be a MultiCohort stage.
I meant to also say: I guess we can do this at our leisure and on a case-by-case basis. It probably doesn't need to be included in this base infrastructure update PR.
I meant to also say: I guess we can do this at our leisure and on a case-by-case basis. It probably doesn't need to be included in this base infrastructure update PR.
@jmarshall, the only issue would be if someone supplied multiple cohorts. If most users are using a single cohort at a time, then you're right we can do it case by case.
Closes https://github.com/populationgenomics/production-pipelines/issues/710
Currently, for each pipeline run a ‘cohort’ is defined, comprising a list of one or more ‘datasets’, comprising a list of one or more ‘sgs’
In this PR, we propose that for each pipeline a ‘multicohort’ will be defined, comprising a list of one or more ‘cohorts’, comprising a list of one or more ‘datasets’, comprising a list of one or more ‘sgs’. 🤯
This means that a user can specify a list of custom cohort ID's (rather than just one), and stages will be generated accordingly.
The key changes here
It is important to note, that we are still supporting a non-cohort run of production pipelines, which means there needs to be some logic to support the old way of doing things (pre-multi-cohorts) for the time being.
TODO
metamist.py
targets.py
inputs.py
intervals_path=inputs.as_path(get_cohort(), PrepareIntervals, 'preprocessed'),
in a SequencingGroupStage where the dependent stage is a CohortStage. *workflow.py
*test_.py**
test_cohort.py
sample_qc.py
combbiner.py
Note for VB, add co-author credit in merge.
*actually I think this is fine, because it will return the outputs for each cohort, but we need to make sure that the new return structures here are suitable.