tgen / vcfMerger2

Dynamic vcfMerger for 2 to N somatic variants vcf files
Other
5 stars 1 forks source link

Introduce an ELSE statement to deal with edge cases #27

Closed ChristopheLegendre closed 1 year ago

ChristopheLegendre commented 1 year ago

https://github.com/tgen/vcfMerger2/blob/8ed9a05b7a23c8a0c2db80fcf83683c35f6602a7/prep_vcfs_somatic/strelka2/strelka2.phasing_consecutives_variants_as_blocs.sh#L197-L206

Example of edge case:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr14   50978766    .   G   T   .   PASS    SOMATIC;QSS=60;TQSS=1;NT=ref;QSS_NT=60;TQSS_NT=1;SGT=GG->GT;DP=115;MQ=59.97;MQ0=0;ReadPosRankSum=1.22;SNVSB=0;SomaticEVS=7.38   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU:AR:AD   0/0:21:0:0:0:0,0:0,0:21,21:0,0:0:21,0   0/1:94:6:0:0:0,0:0,0:81,87:7,7:0.0795:81,7
chr14   50978767    .   T   TG  .   PASS    SOMATIC;QSI=47;TQSI=1;NT=ref;QSI_NT=47;TQSI_NT=1;SGT=ref->het;MQ=59.96;MQ0=0;RU=G;RC=0;IC=1;IHP=18;SomaticEVS=6.74  GT:DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50:AR:AD    0/0:19:19:20,20:0,0:0,0:25.1:0.58:0:0:0:20,0    0/1:90:90:71,73:14,14:9,7:100.19:4.93:0:0.04:0.1647:71,14

This is a SNV followed by an indel; phASER excludes the indel and only the snv remains which is not part of a block anymore --> Therefore phASER fails.

Let's think about any other potential edge case that the phASER tool could be grumpy about and fails because it does not handle single lines after removing or not dealing with the indels anymore.

PedalheadPHX commented 1 year ago

Do we have a picture of the actual event? is the indel in phase or not, if we are not able to phase indels, then shouldn't preprocess just drop all lines with indels before looking for potential block variants where two consecutive vcf lines are one bp apart to send to phaser. Short if indels are a known phaser limitation why are they not prefiltered

ChristopheLegendre commented 1 year ago

We do not phase indels. Phaser takes care of indels by excluding them itself; that is why they were not pre-filtered; Because it avoided us to add more coding lines and creating more intermediate files, we were letting phASER doing the work.

With this edge case, it appears we can not let phASER do the work as phASER does not handle correctly the remaining orphan lines when two consecutives lines involve a SNV followed by an InDel.

The solution is to prefilter the indels out. We have been working on it and implemented it. Tests of modifications in progress.

ChristopheLegendre commented 1 year ago

Indels prefiltering has been implemented