VCF 'Gaps' line has wrong REF base / coordinate

hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes

MIT License

29 stars 5 forks source link

Open tseemann opened 3 years ago

tseemann commented 3 years ago

$ bcftools norm -f ref.fa mapcaller.vcf

Reference allele mismatch at NZ_NGSV01000001:843608 .. REF_SEQ:'N' vs VCF:'A'```

This A is wrong? Maybe it should be . ?

NZ_NGSV01000001 843608  .       A       <*>       0       Gaps    END=844751      GT:GQ:DP:AD       *:*:*:*

>NZ_NGSV01000001:843608-843608
N

And it starts the Gap too late?

>NZ_NGSV01000001:843598-843628
AGCTCAGNNNNNNNNNNNNNNNNNNNNNNNN

tseemann commented 3 years ago

Maybe this is a bigger problem of how to handle NNNNNNN in reference genome?

hsinnan75 commented 3 years ago

The indexing method would assign a random nucleotide for each 'N'. It requires the original reference genome to verify the true nucleotides.

tseemann commented 3 years ago

So bwt_index is putting ranom base for N.

I think you need to correct this problem in your output? Until you fix it, bcftools will not work on your VCF files.

Also the GAP is starting too late? Souldn't it begin where the first N is? Or are reads actually aligning to the random bases??