thegenemyers / DAZZ_DB

The Dazzler Data Base
Other
35 stars 33 forks source link

Fix divide-by-zero #12

Closed pb-cdunn closed 8 years ago

pb-cdunn commented 8 years ago

This fixes several problems:

Divide-by-zero when a bin is empty.

I don't care about 0 totlen or 0-length reads, but I do have a synthetic example with an unusual distribution of reads, such that some bins have none.

Wrong number of bins.

There was a memory error (noted by valgrind), and all reads of maximum length were ignored, both because of a miscalculation. Suppose:

All reads in [0, 2000]. I.e. maxlen == 2000
BIN==1000, so we have 2 bins.
  Bin 0: [0, 999]
  Bin 1: [1000, 1999]

But what about the longest read? Its length is 2000, but there is no bin for it!

Possible solutions:

thegenemyers commented 8 years ago

Hi Christopher,

Committed fixes for the problems you report above.  Solved bin problem by increasing the # of bins by 1.  The div-by-0 bug was introduced when I added the .dam database type for reference sequences.  I presume that the bug occurred on a .dam, it should not with .db's.  Anyway fixed.

Thanks! Gene

pb-cdunn commented 8 years ago

Oops! There is a duplicated statement, at line 137 and line 154:

nbin = (maxlen-1)/BIN + 1;

You fixed the second one, but not the first. So the Malloc() arrays hist and bsub are still too short.

And I think there is the same problem at line 275.

Finally, if you move the new if (totlen <=0) up to line 153, you'll avoid a few more potential div-by-zeros.

thegenemyers commented 8 years ago

Sorry, careless. The latest commit hopefully adsresses eveything. I did not move the totlen check up as I want the program to report 0-reads of 0-length before exiting. Cheers, Gene

On 1/7/16, 4:13 PM, Christopher Dunn wrote:

Oops! There is a duplicated statement, at line 137 and line 154:

nbin = (maxlen-1)/BIN + 1;

You fixed the second one, but not the first. So the |Malloc()| arrays |hist| and |bsub| are still too short.

And I think there is the same problem at line 275.

Finally, if you move the new |if (totlen <=0)| up to line 153, you'll avoid a few more potential div-by-zeros.

— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/pull/12#issuecomment-169690166.

pb-cdunn commented 8 years ago

Oops! You undid the other fix. In order to avoid the longest read going past the end of the array,

nbin  = maxlen/BIN;

should be

nbin  = maxlen/BIN + 1;

Div-by-0 on nreads==0 or TRIM==1 && ototal==0 is not a big deal to me, so I'm fine with the totlen check where it is.

thegenemyers commented 8 years ago

Crickey! Done, sorry. -- G

On 1/7/16, 5:12 PM, Christopher Dunn wrote:

Oops! You undid the other fix. In order to avoid the longest read going past the end of the array,

nbin = maxlen/BIN;

should be

nbin = maxlen/BIN + 1;

Div-by-0 on |nreads==0| or |TRIM==1 && ototal==0| is not a big deal to me, so I'm fine with the totlen check where it is.

— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/pull/12#issuecomment-169710537.

pb-cdunn commented 8 years ago

Perfekt! Danke.