sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
477 stars 79 forks source link

`add_many` is super slow in adding large number of kmers #2756

Open mr-eyes opened 1 year ago

mr-eyes commented 1 year ago

add_many is super efficient in creating average-size signatures. However, it will be very slow in adding a couple of millions of kmers to a minhash. Probably, it's the Python<->Rust overhead. However, creating the same signature with Python (manually) will be way faster.

ctb commented 1 year ago

ref https://github.com/sourmash-bio/sourmash/issues/1617 - remove_many is also very slow on occasion ;)