sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

reduce memory usage of Kmer objects #13

Closed sebhtml closed 12 years ago

sebhtml commented 13 years ago

Example:

presently, for MAXKMERLENGTH=31, a Kmer is stored as 1 uint64_t integer. That is fine.

But if MAXKMERLENGTH=21, a Kmer is also stored as 1 uint64_t integer.

While the public methods will still be getNumberOfU64, getU64 and setU64 for the Kmer class, it would save space to do something like that:

For a maximum of 21 nucleotides, we need 42 bits. So we need 6 bytes -- 5 bytes are not enough.

So here, it would be better to use simply an array of bytes. Instead of an array of uint64_t.

The down side is that getU64 and setU64 will have to do extra work to achieve the same thing.

But anyway, m_u64 is private so the change is local to Kmer.cpp and Kmer.h.

sebhtml commented 12 years ago

I don't think it can be improved.