sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Sometimes there is k-mers with a coverage of 1 with the Bloom filter #114

Closed sebhtml closed 10 years ago

sebhtml commented 11 years ago

This is impossible mathematically.

With a bug, this is possible.

1 6578 2 4338184732 3 505075486 4 224525240 5 101545792 6 72389788 7 49458414 8 36914982 9 27724466 10 21409730 11 16863130 12 13687720 13 11383878 14 9755248

sebhtml commented 11 years ago

ow possibly can this happen ?

With the Bloom filter, any object starts at 2. None starts at 1. And overflow is impossible because max is 2^32-1 (too large to be reachable).

sebhtml commented 11 years ago

reproducible on large datasets

sebhtml commented 11 years ago

obviously something's wrong with construction of these objects

sebhtml commented 11 years ago

Reproducible:

/mnt/scratch_mp2/corbeil/corbeil_group/projects/eel

[boisver1@ip03-mp2 eel]$ head eel-99/CoverageDistribution.txt

KmerCoverage Frequency

Any frequency is a even number because of odd k-mer length

1 36488 2 2189970184 3 329227244 4 186610214 5 93763110 6 71038480 7 51383590 8 40115556

sebhtml commented 11 years ago

Evaluation: 5 human-hours

sebhtml commented 11 years ago
  1. is it reproducible with v2.2.0 ?
  2. can we reproduce it with a small dataset
sebhtml commented 10 years ago

with current code:

KmerCoverage Frequency

Any frequency is a even number because of odd k-mer length

2 238815590 3 3405804 4 625630 5 203138 6 93238 7 54866 8 37556 9 31756 10 29200 11 29004 12 30032 13 32874 14 36024 15 41298

sebhtml commented 10 years ago

fixed https://github.com/sebhtml/ray/commit/e8658d6bec62e2c18748c54aa5b992578f2a101e