populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

GATK-SV transition to MultiCohorts #812

Closed MattWellie closed 1 day ago

MattWellie commented 6 days ago

Closes #811

Absolutely monstrous line count - this is almost entirely caused by merging the multisample_1 & 2 files (1530 lines deleted) into a single multisample file (1328 lines)

Other changes are the deletion of a few config files which would no longer be required:

Creates a new write_ped_file method in MultiCohort(Stage?) (mirroring the same content in CohortStage)

EddieLF commented 4 days ago

Hey Matt, this all looks pretty great, thanks for the effort you've put in to enable Multicohort & Cohort inputs for these stages.

Just to make sure I understand, if we were to submit some kind of workflow with multiple cohorts like this:

[workflow]
input_cohorts = ["COH1", "COH2", "COH3",]
last_stages = ['AnnotateCohortSv']

and invoke the cpg_workflows/stages/gatk_sv/gatk_sv_multisample.py workflow with analysis runner

analysis-runner --config myconfig.toml ... python main.py gatk_sv_multisample

Is this what would happen?

MattWellie commented 4 days ago

Is this what would happen?

Yeah that's the biscuit. It's a bit of an hourglass shape to it, as there's the MergeBatchSites Stage sitting in the middle of what's otherwise an entirely CohortStage workflow (all the Stages previously in multisample_1)