sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
475 stars 80 forks source link

calculate signature complexity #602

Open taylorreiter opened 5 years ago

taylorreiter commented 5 years ago

would it be possible to combine size, scaled, and track-abundance info to calculate complexity of a signature in some way? I think what I want to know is the approx number of k-mers as a ratio of the input number of nucleotides

ctb commented 5 years ago

This strikes me as related to an issue that @luizirber proposed a while back - ISTR it was keeping track of abundance with HLL or some such. I can find it in the sourmash tracker, wonder if it's in khmer?

Anyway, a few thoughts --

Also see #246, tracking number of bp and input sequences.

luizirber commented 5 years ago

@ctb this? https://github.com/dib-lab/sourmash/pull/506

ctb commented 5 years ago

no... I'll chat about it in person!

ctb commented 2 years ago

ref https://github.com/sourmash-bio/sourmash/issues/33 too