sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
455 stars 78 forks source link

documentation and support for other implementations of MinHash and FracMinHash #2284

Open ctb opened 1 year ago

ctb commented 1 year ago

revisiting a (5 yro :) conversation about interoperability https://github.com/marbl/Mash/issues/27 after some discussions on slack with @kescobo on µbioinfo slack. See also a motivating question about implementations in BioJulia https://github.com/BioJulia/BioSequences.jl/issues/243 and this issue in rkmh https://github.com/edawson/rkmh/issues/3.

Here in sourmash, I think we could do more to support alternative implementations of MinHash and FracMinHash in other projects. I'll update this issue more as I think through them, but off the top of my head -

The goal is to make it easy for others to implement simple test suites to check on interoperability of the actual sketching and sketch comparison code.

ctb commented 1 year ago

on slack, luiz pointed out that we already have good Rust interoperability b/c, well, if you scratch sourmash's Python layer just a little bit, you find Rust underneath. And:

There is a C header for doing the interaction between Rust and Python, here: https://github.com/sourmash-bio/sourmash/blob/latest/include/sourmash.h

ctb commented 1 year ago

also: we should provide some simple example code for making use of some of our pre-computed sketches from wort, and converting them into mash-usable sketches.