Open ctb opened 5 years ago
note that this functionality is not trivial, and enabling this kind of general query is in fact one of the works-in-progress that https://github.com/ctb/2017-sourmash-revindex and #604 is devoted to :)
(not trivial doesn't mean it's hard; we have several implementations of such a thing, including e.g. https://github.com/ctb/2017-sourmash-revindex/blob/master/hashes-to-numpy-2.py)
It would now (as of #946) be very easy to add a filtering option to LCA_Database
to support this functionality.
similarly for #1808 and SqliteIndex
, this is quite easy there.
is now generically available as a plugin: https://github.com/ctb/sourmash_plugin_commonhash
also ref https://github.com/sourmash-bio/sourmash/issues/2383
see https://github.com/dib-lab/sourmash/pull/587#discussion_r245368621 by @taylorreiter - "It would be useful if we could have a threshold for the intersection -- like if the hash occurs in 80% of signatures, give it to me in the intersection."