Closed pb-cdunn closed 8 years ago
Hi Christopher,
Committed fixes for the problems you report above. Solved bin problem by increasing the # of bins by 1. The div-by-0 bug was introduced when I added the .dam database type for reference sequences. I presume that the bug occurred on a .dam, it should not with .db's. Anyway fixed.
Thanks! Gene
Oops! There is a duplicated statement, at line 137 and line 154:
nbin = (maxlen-1)/BIN + 1;
You fixed the second one, but not the first. So the Malloc()
arrays hist
and bsub
are still too short.
And I think there is the same problem at line 275.
Finally, if you move the new if (totlen <=0)
up to line 153, you'll avoid a few more potential div-by-zeros.
Sorry, careless. The latest commit hopefully adsresses eveything. I did not move the totlen check up as I want the program to report 0-reads of 0-length before exiting. Cheers, Gene
On 1/7/16, 4:13 PM, Christopher Dunn wrote:
Oops! There is a duplicated statement, at line 137 and line 154:
nbin = (maxlen-1)/BIN + 1; You fixed the second one, but not the first. So the |Malloc()| arrays |hist| and |bsub| are still too short.
And I think there is the same problem at line 275.
Finally, if you move the new |if (totlen <=0)| up to line 153, you'll avoid a few more potential div-by-zeros.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/pull/12#issuecomment-169690166.
Oops! You undid the other fix. In order to avoid the longest read going past the end of the array,
nbin = maxlen/BIN;
should be
nbin = maxlen/BIN + 1;
Div-by-0 on nreads==0
or TRIM==1 && ototal==0
is not a big deal to me, so I'm fine with the totlen check where it is.
Crickey! Done, sorry. -- G
On 1/7/16, 5:12 PM, Christopher Dunn wrote:
Oops! You undid the other fix. In order to avoid the longest read going past the end of the array,
nbin = maxlen/BIN; should be
nbin = maxlen/BIN + 1; Div-by-0 on |nreads==0| or |TRIM==1 && ototal==0| is not a big deal to me, so I'm fine with the totlen check where it is.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/pull/12#issuecomment-169710537.
Perfekt! Danke.
This fixes several problems:
Divide-by-zero when a bin is empty.
I don't care about 0 totlen or 0-length reads, but I do have a synthetic example with an unusual distribution of reads, such that some bins have none.
Wrong number of bins.
There was a memory error (noted by valgrind), and all reads of maximum length were ignored, both because of a miscalculation. Suppose:
But what about the longest read? Its length is 2000, but there is no bin for it!
Possible solutions:
[2000,2999]
, which seems fine.Duplicate line.