mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
86 stars 37 forks source link

My VCF contains virtually no 1/1 genotypes #80

Open mocksu opened 5 years ago

mocksu commented 5 years ago

I have 1060 WGS samples and I used MToolBox V1.1 to call MT variants. When looking at the result VCF file, I noticed that my GTs are either 0 or 0/1 with HF, but only 3 variants with 1/1 GTs among about 7000 variants called. Is this normal or did I do something wrong?

As for the 0/1 calls, I am not sure if I understand it correctly. For instance, for a "0/1:1234:0.999:0.997:1" call, does it mean it's 99.9% sure it is heteroplasmy (ref/alt mixed) or does it mean it's 99.9% sure the genotype is alt? It's just so hard to believe there is only 3/7000 variants that differ from the ref genome for over 1000 samples.

clody23 commented 4 years ago

Hi,

This is expected, especially with very high read depth.

Did you run the analysis on human samples? And what was the average per sample read depth?

In our hands, if you have really high read depth you might get many low heteroplasmies, that could just be technical noise, you might want to remove. This filter would drop the number of 0/1 variants quite a lot...

Also, it might be that you never find homoplasmic variants encoded as 1, but still as 0/1 but with HF > 0.9. This should still be considered homoplasmic. Did you check how many you have like this?

HF = 0.99 means that you have 99% of the reads supporting that alternative variant (i.e. the variant which is alternative to the reference you chose, for example rCRS) and 1% supporting the reference allele. We don't provide probability of the genotype, but just fractions. The 1% might be a residual of wild type or technical error...this is hard to say. Anyway, what you have in the VCF file passed already some QC performed by the MToolBox v. caller.

If you are unsure about this, you can always check the pileup file for those samples at that position (provided by MToolBox) or inspect the OUT2.bam file with IGV...

Also, if you want to remove low heteroplasmy, this script can help you with that.

https://github.com/mitoNGS/MToolBox/blob/master/aux/filter_HF.py

Hope this helps

best wishes, Claudia