populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Compare callsets #755

Closed MattWellie closed 1 month ago

MattWellie commented 1 month ago

Adds in an extra stage UpdateStructuralVariantIDs

This runs a Concordance test between the Previous Callset's 'SpiceUpSVIDs' stage (where IDs were manually created based on descriptive variant attributes), and the current Callset's FilterGenotypes output. This gives us a chance to annotate the current VCF contents with the previous callset's IDs.

The SpiceUpSVIDs stage has been edited - now instead of always generating a new ID, the script will accept an ID in the TRUTH_VID VCF annotation (Truth Variant ID, i.e. ID of this same variant in the 'truth'/previous callset). If TRUTH_VID is empty for a variant, a new ID is created using the existing logic.

NB At this point these Spicy IDs have been manually set in a previous run for Seqr cohorts. VCGS data has not been run through that stage yet, but a run is currently in progress. To cater to this use-case, there's now a behaviour switch depending on whether a Spicy-ID VCF was available from a prior run.

Also there's a couple of line count reduction/linting changes