onecodex / finch-rs

A genomic minhashing implementation in Rust
https://www.onecodex.com
MIT License
92 stars 8 forks source link

Error when kmer count >65535 (per kmer) #25

Closed bovee closed 4 years ago

bovee commented 6 years ago

We store k-mer coverages as 16-bit integers so, of course, we just had some super high-depth 16s sequencing that broke Finch. We should probably just bump these to a u32 because coverage should never be higher than 4 billion (right? right? 😭).

luizirber commented 6 years ago

Is it even meaningful when it gets over 65535? I would say just keep u16::max_value() and avoid using more memory :smiling_imp:

bovee commented 6 years ago

@luizirber I mean, maybe it could be useful? It's only a matter of time before someone dumps a NovaSeq lane full of 16s data on us. Doubling the size of the counters also only adds half-a-meg of memory for a 100,000 item sketch so I think this is probably okay (unless there are embedded Finch users out there 🤨).