sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

seed registration fails at -k 51 but not -k 49 #222

Open sebhtml opened 10 years ago

sebhtml commented 10 years ago

Original message:

-------- Original Message -------- Subject: Re: [Denovoassembler-users] largest kmer-value? Date: Sun, 17 Nov 2013 09:14:54 -0500 From: Hornung, Bastian bastian.hornung@wur.nl To: denovoassembler-users@lists.sourceforge.net denovoassembler-users@lists.sourceforge.net

Hi Sebastien,

the error message is just a segmentation fault, output below:

Rank 0 registered 0/14207 Rank 0 registered 1000/14207 Rank 0 registered 2000/14207 Rank 0 registered 3000/14207 Rank 0 registered 4000/14207 Rank 0 registered 5000/14207 Rank 0 registered 6000/14207 Rank 0 registered 7000/14207 Rank 0 registered 8000/14207 Rank 0 registered 9000/14207 Rank 0 registered 10000/14207 Rank 0 registered 11000/14207 Rank 0 registered 12000/14207 Rank 0 registered 13000/14207 Rank 0 registered 14000/14207 [ssb3:10762] * Process received signal * [ssb3:10762] Signal: Segmentation fault (11) [ssb3:10762] Signal code: Address not mapped (1) [ssb3:10762] Failing at address: (nil) Rank 0 registered 14206/14207 Rank 0 registered its seeds [ssb3:10762] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fbd8d667cb0] [ssb3:10762] [ 1] Ray(_ZN21SeedFilteringWorkflow14finalizeMethodEv+0x221) [0x56ef11] [ssb3:10762] [ 2] Ray(_ZN11TaskCreator8mainLoopEv+0xbe) [0x5cc3fe] [ssb3:10762] [ 3] Ray(_ZN11ComputeCore15runWithProfilerEv+0x382) [0x5cfcd2] [ssb3:10762] [ 4] Ray(_ZN11ComputeCore3runEv+0xbc) [0x5d3b8c] [ssb3:10762] [ 5] Ray(_ZN7Machine5startEv+0x19a6) [0x479416] [ssb3:10762] [ 6] Ray(_ZN11RankProcessI7MachineE3runEv+0x9f) [0x47699f] [ssb3:10762] [ 7] Ray(main+0xc7) [0x4724d7] [ssb3:10762] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fbd8d2b976d] [ssb3:10762] [ 9] Ray() [0x473f01] [ssb3:10762] * End of error message *

(only Rank 0 because it was only running on 1 core) I first thought it could be a memory or general power problem (due to other problems with our hardware, which have been resolved in the meantime), but it now fails to run on my local machine as well as on our server, which has 256 GB of RAM. I guess that should be more than enough for a 5mb prokaryotic genome. The cutoff seems to be at 49, which still works, but it begins to crash at 51, and I have no clue why. If anyone has any idea, then I'd be very happy to hear it.

Best regards,

Bastian

sebhtml commented 10 years ago

http://article.gmane.org/gmane.science.biology.ray-genome-assembler/689

sebhtml commented 10 years ago

see https://www.mail-archive.com/denovoassembler-users@lists.sourceforge.net/msg00716.html

sebhtml commented 10 years ago

see http://permalink.gmane.org/gmane.science.biology.ray-genome-assembler/701

sebhtml commented 10 years ago

gmane thread:

http://thread.gmane.org/gmane.science.biology.ray-genome-assembler/685/focus=689

sebhtml commented 10 years ago

I have sent a message to the end user because I could nott reproduce the issue.

sebhtml commented 10 years ago

The problem seems to depend on the number of cores too:

http://thread.gmane.org/gmane.science.biology.ray-genome-assembler/685/focus=689

sebhtml commented 10 years ago

this seems to be related to checkpointing. Vertex.cpp 176 is called in MessageProcessor.cpp (TAG_START_SEEDING ...)