onecodex / finch-rs

A genomic minhashing implementation in Rust
https://www.onecodex.com
MIT License
92 stars 8 forks source link

Optimize loading sketch file #32

Closed dtolnay closed 5 years ago

dtolnay commented 5 years ago

Tested with 316MB refseq_sketches_21_1000.sk. On the current master branch, this file takes 5.5 seconds to load on my machine. This PR implements three improvements that bring that down to 1.1 seconds, which is 5x faster.

dtolnay commented 5 years ago

I tried some of the other example data files too:

before after improvement
refseq_sketches_21_1000.sk 5.5 sec 1.1 sec 5.0x
refseq_sketches_31_1000.sk 6.4 sec 1.3 sec 4.9x
refseq_sketches_21_10000.sk 39 sec 7.9 sec 4.9x
refseq_sketches_31_10000.sk 43 sec 8.9 sec 4.8x
bovee commented 5 years ago

Thanks for taking the time to put this together; this is really slick!