twolinin / longphase

GNU General Public License v3.0
99 stars 9 forks source link

Small variants (1-2bp insertions and deletion) are not phased #8

Closed vyx-lucy-kaplun closed 1 year ago

vyx-lucy-kaplun commented 2 years ago

Longphase does not phase small variants (such as 1-2 base insertions and deletions) while it successfully phases SNPs.

chr22 10690636 . G T 20.70 PASS P GT:GQ:DP:AF:PS 0|1:20:60:0.5000:10690636 chr22 10690637 . C T 16.59 PASS F GT:GQ:DP:AF:PS 0|1:16:60:0.5167:10690636 chr22 10690675 . A G 8.84 PASS F GT:GQ:DP:AF:PS 0|1:8:60:0.4667:10690636 chr22 10690687 . A C 19.10 PASS P GT:GQ:DP:AF:PS 0|1:19:60:0.4667:10690636 chr22 10690767 . G T 15.98 PASS F GT:GQ:DP:AF:PS 0|1:15:61:0.4918:10690636 chr22 10690778 . T G 19.85 PASS F GT:GQ:DP:AF:PS 0|1:19:61:0.4426:10690636 chr22 10690871 . G A 12.05 PASS F GT:GQ:DP:AF:PS 0|1:12:60:0.3333:10690636 chr22 10690879 . C T 12.07 PASS F GT:GQ:DP:AF:PS 0|1:12:60:0.3000:10690636 chr22 10690945 . A AT 6.33 PASS F GT:GQ:DP:AF:PS 0/1:6:61:0.2787:. chr22 10690998 . A T 13.83 PASS F GT:GQ:DP:AF:PS 0|1:13:62:0.4032:10690636 chr22 10691083 . C T 16.04 PASS F GT:GQ:DP:AF:PS 0|1:16:62:0.5323:10690636 chr22 10691118 . A G 20.58 PASS P GT:GQ:DP:AF:PS 0|1:20:62:0.4677:10690636 chr22 10691148 . G T 15.89 PASS F GT:GQ:DP:AF:PS 0|1:15:62:0.4839:10690636 chr22 10691160 . A G 7.66 PASS F GT:GQ:DP:AF:PS 0|1:7:62:0.4355:10690636 chr22 10691167 . CTT C 11.78 PASS F GT:GQ:DP:AF:PS 0/1:11:62:0.3065:. chr22 10691190 . T C 16.34 PASS F GT:GQ:DP:AF:PS 0|1:16:62:0.5161:10690636 chr22 10691191 . A G 20.96 PASS P GT:GQ:DP:AF:PS 0|1:20:62:0.5323:10690636

I am using a small variants vcf file created by Clair3, a structural variants vcf created by Cute_SV, and the following phasing command: longphase_linux-x64 phase  -s merge_output.vcf.gz --sv-file cutesv.vcf.gz  -b sorted.bam -r hg38.fa -t 8 -o longphase_combined.phased --ont

ythuang0522 commented 2 years ago

We skipped phasing small indels due to lower accuracy compared with SNPs called by DeepVariant or Clair, especially in ONT with frequent indel errors. It can be implemented at the cost of increasing switch errors. Having said that, we will add this as an additional option, similar to whatshap, in the future.

mproberts99 commented 1 year ago

Is this feature still going to be added?

ythuang0522 commented 1 year ago

Hi, the co-phasing of small indels was postponed due to the concern of ONT indel errors which reduce phasing accuracy. Since then we put our efforts mostly on improving the SNP-only phasing accuracy. Having said that, we noticed the upcoming R10.4 seems getting rid of the homopolymer issue. Recently, we are sparing time for implementing the indel co-phasing, which would take a few weeks. So yes, it will be added in the next version.