sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
467 stars 80 forks source link

filter hashes from signatures with abundance < 1 #601

Closed taylorreiter closed 4 years ago

taylorreiter commented 5 years ago

This could be useful for signatures that represent reads that have not been k-mer trimmed

ctb commented 5 years ago

well, technically, hashes with abundance < 1 are not going to be present . Probably you mean k-mers with abundance below 2...

This is an interesting idea. A few thoughts --

anyway, those are my thoughts.

certainly I think a postprocessing command that eliminates low-abundances hashes from the sourmash signature is easy to do & worth doing. whether it resolves this issue is a different question :)

ctb commented 5 years ago

a specific use case for this that @taylorreiter had in mind all along and I just now understood: looking at Nanopore sequences, where trim-low-abund is inappropriate (because it trims the sequences, which works fine for Illumina but is a bad idea for Nanopore!)

This would actually be addressed by https://github.com/dib-lab/khmer/issues/1615 quite nicely...