sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Possible infinite parallel loop #58

Closed sebhtml closed 12 years ago

sebhtml commented 12 years ago

Reported-by Egon Ozer e-ozer@fsm.northwestern.edu Operating system: OSX 10.6.8 K-mer length: 31 Version: v2.0.0-rc8 101 bp, paired-end, 3.27 M pairs per genome

There seems to be an intermittent problem with the "Bidirectional extension of seeds" step.

I am performing de novo assembly of several bacterial genomes sequenced on Illumina HiSeq (101 bp, paired-end, 3.27 M pairs per genome) using Ray v2.0.0-rc8. Most of the assemblies do just fine, but every now and then I get one that gets stuck in some sort of loop during the bidirectional extension of seeds step. Usually an assembly takes ~ 1hr 50min, but I had one that started around midnight and by 9:30 this morning was still running. I killed it with this as the final few lines of output:

... Speed RAY_SLAVE_MODE_EXTENSION 2803 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76880500 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 3150 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76880600 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2778 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76880700 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2674 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76880800 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2680 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76880900 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2769 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881000 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2813 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881100 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2794 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881200 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2844 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881300 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2718 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881400 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2926 units/second Rank 2: assembler memory usage: 0 KiB Rank 2 reached 76881500 vertices from seed 285, flow 1 Speed RAY_SLAVE_MODE_EXTENSION 2720 units/second Rank 2: assembler memory usage: 0 KiB 15 total processes killed (some possibly by mpiexec during cleanup)

I've been playing around a bit with k-mer sizes and # of reads and was running all of these assemblies with a k of 31. Previously I had done assemblies of these same data sets using v2.0.0-rc5 at k of 25 and using 9.5 M pairs instead of running subsets as I was last night. Back then I successfully assembled this particular set, but had had to kill two others for this same problem. The set I had this problem with last night, however, completed assembly previously when I did k25 on v2.0.0-rc5 and about 3x the reads.

Wondering if anyone else has seen this problem or is it just me? I'm running Ray on OSX 10.6.8 if that's any help.

Thanks,

sebhtml commented 12 years ago

add an option called -disable-recomb-pair-algorithms to check if the bug is due to that.

in plugin_SeedExtender/SeedExtender.cpp

sebhtml commented 12 years ago

Rank 2 starts on seed 285, length is 93, flow 0 [285/292] Current peak coverage -> 163

I'll start it up again and see if the same thing happens.

sebhtml commented 12 years ago

-disable-recycling solves the issue on Mac OS X