Open sremuk opened 4 years ago
Hi,
a difference between RCRS and RSRS-based variant calling is indeed expected. The reason is that RSRS is better suited for haplogroup prediction, being a virtual reference sequence based at the root of the human mitochondrial phylogeny. rCRS instead belongs to haplogroup H2a2a1, which is a modern European haplogroup that shares more common variants with European clades. This might result into a better haplogroup prediction, especially for European samples, where more informative variant alleles are found when using RSRS as reference sequence. In any case, the presence of more than one haplogroup prediction is indicative of an inadequate number of informative variants available in your variant calling, that leads to multiple (best) haplogroup predictions.
The case you mentioned, with no VCF with rCRS, might indicate that the sample is either identical to H2a2a1 genotype or that something went wrong in the MToolBox run. Can you please share the log file of the run for us to investigate more?
Thanks Claudia
Also, I would suggest couple of useful readings about rCRS and RSRS differences, most importantly the Behar et al paper from 2012:
When using RCRS vs RSRS for the same file., there is a difference in the output haplogroups. There are much more with RSRS as opposed to RCRS. Is there a big difference in the genome files? Also, there is no vcf output with RCRS settings.