Closed ctb closed 2 years ago
(of course, this all needs to be balanced against the point that they can grow indefinitely :)
note Richard Durbin's modimizer, which uses similar concepts! https://github.com/richarddurbin/modimizer - the README is informative for this issue.
a more succinct way of putting the containment guarantees above are "Containment never decreases as you get more data" (which is nice for streaming esp.)
note also that you can subtract and add scaled signatures, and filter them on abundance, and other things, without fear.
preprinted and available! see link in #823.
From private conversations with @luizirber @bluegenes @halexand recently -- scaled signatures are different from MinHash because:
downsample
and discussion in API docs in #596), and both could maybe be built simultaneously (see #538)These properties need to be clearly laid out, discussed, evaluated empirically, and (ideally) described theoretically. cc @dkoslicki