tgen / vcfMerger2

Dynamic vcfMerger for 2 to N somatic variants vcf files
Other
5 stars 1 forks source link

Strelka2 prep BlockSub creation failing in complex tandem blocksub #20

Closed PedalheadPHX closed 4 years ago

PedalheadPHX commented 4 years ago

This looks like an edge case that may be related to phaser itself or underlying code in this repository. t Scenario: Five consecutive calls from two independent but consecutive block substitutions followed by a single SNV

Sample in question: MMRF_2787_1_PB_WBC_C3_KHS5U-MMRF_2787_1_BM_CD138pos_T1_KHS5U.bwa.strelka2.pass.vcf.gz

PASS VCF CALLS: chr2 88860919 . T A chr2 88860920 . G A chr2 88860921 . A G chr2 88860922 . G C chr2 88860923 . C T

Post PREP OUTPUT: chr2 88860919 . TGC AAT (## The "C" is not a ref allele, this is the main issue) chr2 88860921 . AG GC

Based on the image below the correct output should be: chr2 88860919 . TG AA chr2 88860921 . AG GC chr2 88860923 . C T

It Actually looks like the resulting call that is causing the issue is trying to phase the blocksub and SNV, as they look phased which should have provided a call like chr 88860919 . TGagC AAagT

its like we stripped out the two ref bases shown in lower case

complex_blocksub_issue
ChristopheLegendre commented 4 years ago

bug fixed. Phased Variants were captured together even if locations were not consecutive. The capture of blocks has been revisited for non-consecutive phased variants.