sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

adaptative bloom filter #139

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

Found out why memory usage was ridiculously high for large graphs

The graph:

$ head SRA056234-Ray-colosse-2013-01-21-1/GraphPartition.txt

Rank NumberOfKmers IdealNumberOfKmers Difference RelativeDifference

TotalKmers: 115319304694

Ranks: 2025

IdealNumberOfKmers: 56947804

0 56972082 56947804 24278 0.042632% 1 56947250 56947804 -554 -0.000972821% 2 56943242 56947804 -4562 -0.00801084% 3 56932526 56947804 -15278 -0.0268281% 4 56930970 56947804 -16834 -0.0295604% 5 56962934 56947804 15130 0.0265682%

In the log:

Rank 251 number of set bits in the Bloom filter: 239880699 / 268435456 Warning: the oracle is half full.

Fix:

increase -bloom-filter-bits

sebhtml commented 11 years ago

Case 1

512 MPI ranks, human genome 3 Gb Rank 320 number of set bits in the Bloom filter: 103542215 / 268435456

$ head HiSeq-2500-NA12878-demo-2x150-2013-01-18-2/GraphPartition.txt

Rank NumberOfKmers IdealNumberOfKmers Difference RelativeDifference

TotalKmers: 6127003074

Ranks: 512

IdealNumberOfKmers: 11966802

0 11981318 11966802 14516 0.121302% 1 11964730 11966802 -2072 -0.0173146% 2 11968022 11966802 1220 0.0101949% 3 11963112 11966802 -3690 -0.0308353% 4 11966168 11966802 -634 -0.00529799% 5 11969386 11966802 2584 0.0215931%

see http://bioinformatics.oxfordjournals.org/content/27/4/479.full

sebhtml commented 11 years ago
Rank 4 number of set bits in the Bloom filter: [ 228681811 / 268435456 ] (85.1906%) Warning: the oracle is half full.
Rank 31 number of set bits in the Bloom filter: [ 228693168 / 268435456 ] (85.1948%) Warning: the oracle is half full.
Rank 5 number of set bits in the Bloom filter: [ 228685267 / 268435456 ] (85.1919%) Warning: the oracle is half full.
Rank 29 number of set bits in the Bloom filter: [ 228672137 / 268435456 ] (85.187%) Warning: the oracle is half full.
Rank 1710 number of set bits in the Bloom filter: [ 228672298 / 268435456 ] (85.1871%) Warning: the oracle is half full.
Rank 855 number of set bits in the Bloom filter: [ 228676672 / 268435456 ] (85.1887%) Warning: the oracle is half full.
Rank 0 number of set bits in the Bloom filter: [ 228669609 / 268435456 ] (85.1861%) Warning: the oracle is half full.
sebhtml commented 11 years ago

Bloom profile from job # SRA056234-k111-Picea-glauca-mp2-2025-2013-01-16-13

sebhtml commented 11 years ago

other probably have that, even -k 21 too (probably job revision #9 (suffix -9) or something

sebhtml commented 11 years ago

ae58c3776c6a86dc3a60bbc87d826d35d926b3f3