mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
86 stars 37 forks source link

RCRS vs RSRS output #90

Open sremuk opened 4 years ago

sremuk commented 4 years ago

When using RCRS vs RSRS for the same file., there is a difference in the output haplogroups. There are much more with RSRS as opposed to RCRS. Is there a big difference in the genome files? Also, there is no vcf output with RCRS settings.

clody23 commented 4 years ago

Hi,

a difference between RCRS and RSRS-based variant calling is indeed expected. The reason is that RSRS is better suited for haplogroup prediction, being a virtual reference sequence based at the root of the human mitochondrial phylogeny. rCRS instead belongs to haplogroup H2a2a1, which is a modern European haplogroup that shares more common variants with European clades. This might result into a better haplogroup prediction, especially for European samples, where more informative variant alleles are found when using RSRS as reference sequence. In any case, the presence of more than one haplogroup prediction is indicative of an inadequate number of informative variants available in your variant calling, that leads to multiple (best) haplogroup predictions.

The case you mentioned, with no VCF with rCRS, might indicate that the sample is either identical to H2a2a1 genotype or that something went wrong in the MToolBox run. Can you please share the log file of the run for us to investigate more?

Thanks Claudia

clody23 commented 4 years ago

Also, I would suggest couple of useful readings about rCRS and RSRS differences, most importantly the Behar et al paper from 2012:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322232/