populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
4 stars 1 forks source link

Remove '--merge-input-intervals' args from seqr load job commands #725

Closed EddieLF closed 4 months ago

EddieLF commented 4 months ago

Removes the --merge-input-intervals flag from the GenomicsDBImport and GenotypeGVCFs commands used by the joint genotyping jobs in the seqr loader pipeline.

Rationale:

The genomes input into this pipeline had gVCF files created before we decided to exclude telomeres and centromeres from the joint genotyping stage. This means the individual gVCFs have genotypes in the excluded regions.

When we use the --merge-input-intervals flag, we are merging the interval_list that excludes telomeres and centromeres with the intervals from the gVCFs that include telomeres and centromeres. The end product is an interval_list that includes telomeres and centromeres, meaning these regions are still being traversed.

We don't want this, so hopefully removing these args from the GenomicsDB creation and GenotypeGVCFs jobs will work to prevent this.