Open ctb opened 2 years ago
Just a thought --
Beyond general warnings /documentation, we could also warn about this (& prevent translate x translate ANI) if we stored a property in the MinHash
class that described whether or not the sketch was generated via translation. ref #268
tl;dr Yay ANI! https://github.com/sourmash-bio/sourmash/pull/1967 Boo ANI on translated sequences unless max_containment used :(.
Longer backstory:
For translated DNA x protein, we will have many spurious proteins (unless we use orpheum :)).
Thus the Jaccard will be very different from the containment which will be very different from the max containment.
Thus the Jaccard ANI will be very different from the containment ANI which will be very different from the max containment ANI.
(It is even worse for translated DNA x translated DNA, where there is no solution, I think.)
@bluegenes sez: