mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
86 stars 37 forks source link

Heteroplasmic fraction #68

Open Madelinehazel opened 5 years ago

Madelinehazel commented 5 years ago

Hello,

I am wondering why certain variants in the *annotation.csv file are not annotated with a heteroplasmic fraction. The same variants ARE annotated with a heteroplasmic fraction in the vcf. For example:

chrMT 12705 . C T . PASS AC=2;AN=3 GT:DP:HF:CILOW:CIUP 0/1:4848:0.999:0.997:0.999 1:2835:1.0:0.998:1.0

In the sample.vcf file, this variant has an HF of 0.999 in the first sample, and 0.998 in the second.

In the *annotation.csv file for the first sample, there is no heteroplasmic fraction.

CP029-P 12705T yes MT-ND5 0.894 3 syn 0.0036 (etc)

Almost a third of the variant in the annotation file do not have an HF. Because this is an important field for filtering for variants of interest, it would be great to resolve this.

Thank you! Madeline

clody23 commented 5 years ago

Dear Madeline,

thanks for reporting this issue and apologies for the delay in answering your question.

We think that this might be due to the fact that position 12705T is the consensus allele in RSRS (which is the second reference sequence used in MToolBox for variant annotation and haplogroup prediction). rCRS and RSRS have 52 different alleles in total, you can check here http://www.phylotree.org/resources/RSRS_vs_rCRS.htm

In the annotation.csv file you also have variants which are not in the VCF file because they are homoplasmic in rCRS (if rCRS is used as reference sequence for mapping and variant calling) but are still reported in the annotation because they are variants with respect to RSRS. These variants will have no HF in the annotation file because they were not found in the variant calling and hence they are not in the VCF, but have to be considered as homoplasmic. It might be that there is an unexpected behavior of the tool when it finds one of those 52 variants in the VCF, which leads to no HF in the annotation.csv file even if the variant was actually found in the variant calling.

We will investigate this issue and try to fix it as soon as possible.

Many thanks Claudia