lh3 / dipcall

Reference-based variant calling pipeline for a pair of phased haplotype assemblies
MIT License
98 stars 10 forks source link

Possible error outputting complex variants #8

Open jzook opened 2 years ago

jzook commented 2 years ago

As we've been working towards assembly-based benchmarks for GIAB, I think we've encountered an issue with dipcall a number of times now. In particular, when the alignment in the bam file correctly has an insertion immediately before a deletion (representing a complex variant), the dip.vcf file will sometimes have only the insertion but not the deletion. An example of this is at chrX:69,487,386-69,487,425 on GRCh38 in this image from the HPRC assembly of HG002 with dipcallv0.3 (assembly, bams, and vcf at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/HPRC-HG002.cur.20211005/). There are similar issues at chrX:27,361,826-27,361,865, chrX:40,208,433-40,208,472, and chrX:69,404,760-69,404,799. Let me know if you have any questions, and thank you for developing this really useful tool!

Screen Shot 2022-03-02 at 12 18 50 PM