zd1 / telseq

A software for calculating telomere length
GNU General Public License v3.0
67 stars 26 forks source link

Harden telseq against integer overflows #23

Closed edawson closed 6 years ago

edawson commented 6 years ago

I ran telseq on a number of very deep (~120X) whole genomes, where I wanted to treat everything as one read group.

I found that in almost all of my samples, I got an overflow for the length estimate (which was reported as a large negative number).

This PR replaces all of the int and unsigned int types with uint64_t. This should prevent this from happening, given that it's not wild to hit 2 billion + reads, and assuming the compiler is generous enough to give an int 32 bits in the first place. I don't see any values that could ever be negative, but please replace uint64_t with int64_t if I've somehow missed one.

It probably doubles the memory usages on most systems, though memory usage of telseq seems to be quite low.

pdiakumis commented 6 years ago

I've noticed this issue too with the conda version. Is it possible to do a new release with this PR, so that I can update the conda package? Many thanks - Peter

zd1 commented 6 years ago

Sure, I've just created a release that includes changes by this PR. Thanks, Zhihao