populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
5 stars 1 forks source link

More flexibility in batch sizes #815

Closed MattWellie closed 3 months ago

MattWellie commented 3 months ago

This script is used to take multiple separate batch outputs, merge them all, and produce new batches from the larger cohort.

This overcomes a limitation in GATK-SV where batches > 500 create errors when collecting/calculating all the median coverage files (huge amount of data), and allows us to take the pre-calculated metrics from a larger number of samples when deciding new batches.

This is just a quick bump to let us define the output batch sizes with more flexibility