Open Amanda-Biocortex opened 6 months ago
hi @Amanda-Biocortex, the calculation is published here:
Deriving confidence intervals for mutation rates across a wide range of evolutionary distances using FracMinHash https://genome.cshlp.org/content/33/7/1061
(preprint here: https://www.biorxiv.org/content/10.1101/2022.01.11.475870v4)
My recollection is that the calculation is based on the decay in the fraction of k-mers that match as sequences diverge.
I believe 95% ANI threshold is standard- would this be the same for Sourmash ANI?
95% is usually used for species cutoffs between two genomes. sourmash's containment ANI is (should be) directly relatable to alignment-based ANI. So, yes? :)
If you're comparing a genome to a metagenome containing multiple strains, then I think things get more complicated and interesting - it would be like you are aligning the reads to both genomes, and then calculating the best ANI match at each location, I think.
Hi,
Could you help me to understand how QueryContainmentAni and MatchContainmentAni are calculated?
Given the use of exact kmer matches, I would assume that the ANI between a query kmer and a reference kmer would be 100%? Or is the ANI calculated between the contiguous set of kmers and the reference?
I believe 95% ANI threshold is standard- would this be the same for Sourmash ANI?
Many thanks, Amanda