sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
466 stars 79 forks source link

add functionality to `sourmash signature` to extract common hashes? #603

Open ctb opened 5 years ago

ctb commented 5 years ago

see https://github.com/dib-lab/sourmash/pull/587#discussion_r245368621 by @taylorreiter - "It would be useful if we could have a threshold for the intersection -- like if the hash occurs in 80% of signatures, give it to me in the intersection."

ctb commented 5 years ago

note that this functionality is not trivial, and enabling this kind of general query is in fact one of the works-in-progress that https://github.com/ctb/2017-sourmash-revindex and #604 is devoted to :)

ctb commented 5 years ago

(not trivial doesn't mean it's hard; we have several implementations of such a thing, including e.g. https://github.com/ctb/2017-sourmash-revindex/blob/master/hashes-to-numpy-2.py)

ctb commented 4 years ago

It would now (as of #946) be very easy to add a filtering option to LCA_Database to support this functionality.

ctb commented 2 years ago

similarly for #1808 and SqliteIndex, this is quite easy there.

ctb commented 11 months ago

is now generically available as a plugin: https://github.com/ctb/sourmash_plugin_commonhash

also ref https://github.com/sourmash-bio/sourmash/issues/2383