Open bluegenes opened 2 years ago
Hot take: Warning + None
-ing it out as in #2004 seems good for now.
Handled by #2032.
This is happening much more often than I expected, and for some applications (prefetch, gather), can yield very verbose output (#2058).
Can deal with verbosity by changing the warning strategy, but I'm not sure zeroing out is the right call, if it's happening this often...
thresholds modified in #2074
With #1967, we will now estimate ANI for any scaled sketch comparisons, regardless of sketch size. These estimates may be inaccurate for viruses/small genomes.
context from https://github.com/sourmash-bio/sourmash/pull/1967#discussion_r859966317: @ctb:
@bluegenes: At the moment, we just report the ANI, except for extremely tiny test data where we can't actually estimate ANI.
For jaccard --> ANI, we estimate the error on the jaccard estimate itself, and raise a warning when the error may be too high (but still currently report ANI). I have an item in the
SearchResult
class that keeps track of whether the jaccard estimate error is too high -- I think we should at least consider doing that, but would also be open to zeroing out the ANI estimate.From https://github.com/sourmash-bio/sourmash/issues/1798#issuecomment-1015762466:
I was hoping we might be able to use HLL to avoid issues with small sketches, but I suppose instead we could use this to estimate an error based on the sketch size, and raise a warning when the error/ zero out the ANI when the error is too high?