olgabot / nf-kmer-similarity

MIT License
1 stars 0 forks source link

Add option to create MinHash LSH #24

Open olgabot opened 5 years ago

olgabot commented 5 years ago

Use datasketch (Thanks @phoenixAja!) to create either a MinHash LSH to find samples whose similarity is above a threshold, or MinHash LSH Forest to find top-k similar samples.

@phoenixAja and @neevor - may be good to integrate the extract_kmers project to extract raw k-mers to create the LSHs.