sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
472 stars 80 forks source link

update docs with more about ANI #2052

Open ctb opened 2 years ago

ctb commented 2 years ago

(this can be a bit of a running issue until we get around to fixing them ;)

The introduction of ANI output in v4.4.0 is pretty awesome, with lots of implications for sourmash and tools that build on/use sourmash (e.g. spacegraphcats).

searching for ANI finds https://sourmash.readthedocs.io/en/latest/classifying-signatures.html?highlight=ani#estimating-ani-from-fracminhash-comparisons which doesn't mention compare.

for ANI, we should talk about the minimum ANI that can be detected, and the size that sketches need to be in order to properly estimate ANI. for example, over in https://github.com/sourmash-bio/sourmash/issues/1859, it looks like the minimum detectable ANI is in the 80-85% range with k=31. It'd be great to put some of these graphs and explanations into the documentation!

ctb commented 2 years ago

see also the increasingly extensive discussion here: https://github.com/sourmash-bio/sourmash/issues/2128