Closed MattWellie closed 1 day ago
Hey Matt, this all looks pretty great, thanks for the effort you've put in to enable Multicohort & Cohort inputs for these stages.
Just to make sure I understand, if we were to submit some kind of workflow with multiple cohorts like this:
[workflow]
input_cohorts = ["COH1", "COH2", "COH3",]
last_stages = ['AnnotateCohortSv']
and invoke the cpg_workflows/stages/gatk_sv/gatk_sv_multisample.py
workflow with analysis runner
analysis-runner --config myconfig.toml ... python main.py gatk_sv_multisample
Is this what would happen?
MakeCohortCombinedPed
MakeMultiCohortCombinedPed
GatherBatchEvidence
-> ClusterBatch
-> GenerateBatchMetrics
-> FilterBatch
MergeBatchSites
-> CombineExclusionLists
GenotypeBatch
MakeCohortVcf
-> FormatVcfForGatk
-> JoinRawCalls
-> SVConcordance
-> GeneratePloidyTable
-> FilterGenotypes
-> UpdateStructuralVariantIDs
-> AnnotateVcf
-> AnnotateVcfWithStrvctvre
->SpiceUpSVIDs
-> AnnotateCohortSv
Is this what would happen?
Yeah that's the biscuit. It's a bit of an hourglass shape to it, as there's the MergeBatchSites
Stage sitting in the middle of what's otherwise an entirely CohortStage workflow (all the Stages previously in multisample_1)
Closes #811
Absolutely monstrous line count - this is almost entirely caused by merging the multisample_1 & 2 files (1530 lines deleted) into a single multisample file (1328 lines)
Other changes are the deletion of a few config files which would no longer be required:
Creates a new
write_ped_file
method in MultiCohort(Stage?) (mirroring the same content in CohortStage)