populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Exome gCNV callsets should be clustered by family #778

Open MattWellie opened 1 month ago

MattWellie commented 1 month ago

CustomCohorts: COH179, COH187 SureSelect Clinical Research Exome V1, QXT Chemistry batches 1 & 2

When loading the resulting CNV callset index into Seqr @EddieLF has experienced a failure - one of the indexes contains a partial family, which is not permitted. It was a complete fluke that we hadn't run into this failure mode before, but we should re-batch this pair of cohorts to respect family groups. This will also mean that the variants within family units are segmented together, so that MOI tests will correctly parse the inheritance of each identified variant.

So far the exome datasets have been collected using all exomes on a given capture, or stratifying by project where we have a ton on a specific capture, so for most other runs this condition has been accidentally satisfied