Closed ChristopheLegendre closed 1 year ago
Do we have a picture of the actual event? is the indel in phase or not, if we are not able to phase indels, then shouldn't preprocess just drop all lines with indels before looking for potential block variants where two consecutive vcf lines are one bp apart to send to phaser. Short if indels are a known phaser limitation why are they not prefiltered
We do not phase indels. Phaser takes care of indels by excluding them itself; that is why they were not pre-filtered; Because it avoided us to add more coding lines and creating more intermediate files, we were letting phASER doing the work.
With this edge case, it appears we can not let phASER do the work as phASER does not handle correctly the remaining orphan lines when two consecutives lines involve a SNV followed by an InDel.
The solution is to prefilter the indels out. We have been working on it and implemented it. Tests of modifications in progress.
Indels prefiltering has been implemented
https://github.com/tgen/vcfMerger2/blob/8ed9a05b7a23c8a0c2db80fcf83683c35f6602a7/prep_vcfs_somatic/strelka2/strelka2.phasing_consecutives_variants_as_blocs.sh#L197-L206
Example of edge case:
This is a SNV followed by an indel;
phASER
excludes theindel
and only thesnv
remains which is not part of a block anymore --> ThereforephASER
fails.Let's think about any other potential edge case that the
phASER
tool could be grumpy about and fails because it does not handle single lines after removing or not dealing with theindels
anymore.