sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
466 stars 79 forks source link

display ANI in search results? #2001

Open ctb opened 2 years ago

ctb commented 2 years ago

Once https://github.com/sourmash-bio/sourmash/pull/1967 is merged, ANI will available in CSV files! 🎉

It is also available in sourmash compare matrix output, if --ani is used. 🎉

I don't think it is displayed anywhere else.

Do we want to add ANI to the search output, @bluegenes? I'm in favor - the search results are pretty sparse so I think we even have room for them.

Not sure about gather, though. I think the k-mer overlap approaches makes more sense, maybe? But it would be nice to have as an option, maybe? 🤔

Maybe do it for search first, since I'm pretty sure that's a good idea, and then a separate PR (no hurry) for gather?

bluegenes commented 2 years ago

ANI to search output would be great, but a few thoughts:

ctb commented 2 years ago

we can't estimate ANI for num signatures, so num vs scaled outputs would be different (maybe they already are?)

ahh, I'd kinda forgotten that.

bluegenes commented 2 years ago

ahh, I'd kinda forgotten that.

we could actually estimate ANI from num sketches using the Mash Distance, but I'd rather not, because:

  1. don't want to confuse folks / have them use the ANI's interchangeably
  2. want to encourage transition to FracMinHash
ctb commented 2 years ago

agreed.