Closed sebhtml closed 11 years ago
reproducible
Ray: code/plugin_MessageProcessor/MessageProcessor.cpp:1637: void MessageProcessor::call_RAY_MPI_TAG_SAVE_WAVEPROGRESSION(Message): Assertion `node!=null' failed. Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC[cp2061:01397] _\ Process received signal *** [cp2061:01397] Signal: Aborted (6) [cp2061:01397] Signal code: (-6) [cp2061:01397] [ 0] /lib64/libpthread.so.0() [0x305840f490] [cp2061:01397] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3058032945] [cp2061:01397] [ 2] /lib64/libc.so.6(abort+0x175) [0x3058034125] [cp2061:01397] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x305802b955] [cp2061:01397] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x304) [0x4c5464] [cp2061:01397] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x19) [0x4c55b9] [cp2061:01397] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x58edb4] [cp2061:01397] [ 7] Ray(_ZN11ComputeCore3runEv+0x5c) [0x59359c] [cp2061:01397] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4725fe] [cp2061:01397] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x4700ea] [cp2061:01397] [10] Ray(main+0xc7) [0x470387] [cp2061:01397] [11] /lib64/libc.so.6(libc_start_main+0xfd) [0x305801ec9d] [cp2061:01397] [12] Ray() [0x46c231] [cp2061:01397] * End of error message *
reproducible from checkpoints in 5 min on 24 cores !
Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC[cp2061:04959] Signal: Aborted (6) [cp2061:04959] Signal code: (-6) [cp2061:04959] [ 0] /lib64/libpthread.so.0() [0x305840f490] [cp2061:04959] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3058032945] [cp2061:04959] [ 2] /lib64/libc.so.6(abort+0x175) [0x3058034125] [cp2061:04959] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x305802b955] [cp2061:04959] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x304) [0x4c5464] [cp2061:04959] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x19) [0x4c55b9] [cp2061:04959] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x58edb4] [cp2061:04959] [ 7] Ray(_ZN11ComputeCore3runEv+0x5c) [0x59359c] [cp2061:04959] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4725fe] [cp2061:04959] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x4700ea] [cp2061:04959] [10] Ray(main+0xc7) [0x470387] [cp2061:04959] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x305801ec9d] [cp2061:04959] [12] Ray() [0x46c231]
checkpoint restart with default storage CONFIG_PATH_STORAGE_DEFAULT (Sample_CQDM2-3-Ray-11): PASS checkpoint restart with block storage CONFIG_PATH_STORAGE_BLOCK (Sample_CQDM2-3-Ray-15): FAIL
with not-found object AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC
[boisver1@cp0869 boisver1]$ grep AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC Sample_CQDM2-3-Ray-16.1.* Sample_CQDM2-3-Ray-16.1.09:Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC Sample_CQDM2-3-Ray-16.1.12:[GraphPath::readObjectInBlock] returns AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC
so the object is read
[GraphPath::readObjectInBlock] returns AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC position: 120945 blocks: 173 vertices: 176284 kmerLength: 51 blockNumber: 118 positionInBlock: 113
thie object is inside 1 block.
problem is in write, not read
Ray: code/pluginSeedingData/GraphPath.cpp:391: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed.
Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. Error: expected: GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA actual: TGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGA at position120935 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096
15
Error: expected: ATGTTTATAAGTACTCAGAAATTCTCTGTCTAGGAGTCCGTCTTCCCAGCT actual: CCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGGCT at position120750 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096 Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. [cp0869:18033] * Process received signal * [cp0869:18033] Signal: Aborted (6) [cp0869:18033] Signal code: (-6) [cp0869:18033] [ 0] /lib64/libpthread.so.0() [0x3a0b40f490] [cp0869:18033] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3a0b032945] [cp0869:18033] [ 2] /lib64/libc.so.6(abort+0x175) [0x3a0b034125] [cp0869:18033] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3a0b02b955] [cp0869:18033] [ 4] Ray(_ZN9GraphPath18writeObjectInBlockEP4Kmer+0x74e) [0x55134e] [cp0869:18033] [ 5] Ray(_ZN12JoinerWorker4workEv+0x32a0) [0x4a8200] [cp0869:18033] [ 6] Ray(_ZN16VirtualProcessor3runEv+0xcb) [0x5a058b] [cp0869:18033] [ 7] Ray(_ZN11TaskCreator8mainLoopEv+0x25) [0x58dfa5] [cp0869:18033] [ 8] Ray(_ZN11ComputeCore10runVanillaEv+0x133) [0x58f933] [cp0869:18033] [ 9] Ray(_ZN11ComputeCore3runEv+0x5c) [0x5940dc] [cp0869:18033] [10] Ray(_ZN7Machine5startEv+0x135e) [0x47261e] [cp0869:18033] [11] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x47010a] [cp0869:18033] [12] Ray(main+0xc7) [0x4703a7] [cp0869:18033] [13] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a0b01ec9d] [cp0869:18033] [14] Ray() [0x46c249] [cp0869:18033] * End of error message *
4
Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. Error: expected: GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA actual: TGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGA at position120935 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096 [cp0869:18022] * Process received signal * [cp0869:18022] Signal: Aborted (6) [cp0869:18022] Signal code: (-6) [cp0869:18022] [ 0] /lib64/libpthread.so.0() [0x3a0b40f490] [cp0869:18022] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3a0b032945] [cp0869:18022] [ 2] /lib64/libc.so.6(abort+0x175) [0x3a0b034125] [cp0869:18022] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3a0b02b955] [cp0869:18022] [ 4] Ray(_ZN9GraphPath18writeObjectInBlockEP4Kmer+0x74e) [0x55134e] [cp0869:18022] [ 5] Ray(_ZN12JoinerWorker4workEv+0x32a0) [0x4a8200] [cp0869:18022] [ 6] Ray(_ZN16VirtualProcessor3runEv+0xcb) [0x5a058b] [cp0869:18022] [ 7] Ray(_ZN11TaskCreator8mainLoopEv+0x25) [0x58dfa5] [cp0869:18022] [ 8] Ray(_ZN11ComputeCore10runVanillaEv+0x133) [0x58f933] [cp0869:18022] [ 9] Ray(_ZN11ComputeCore3runEv+0x5c) [0x5940dc] [cp0869:18022] [10] Ray(_ZN7Machine5startEv+0x135e) [0x47261e] [cp0869:18022] [11] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x47010a] [cp0869:18022] [12] Ray(main+0xc7) [0x4703a7] [cp0869:18022] [13] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a0b01ec9d] [cp0869:18022] [14] Ray() [0x46c249] [cp0869:18022] * End of error message *
lol this is caused by the fact that this path is wrong...
code is fine, de Bruijn graph property is not respected here.
this is cool because the new block storage enforces de Bruijn links, thus even less assembly oddities
#4 Error: can not add GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA last objects: [120919] ------> ACTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGA [120920] ------> CTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAA [120921] ------> TGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAG [120922] ------> GCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGG [120923] ------> CTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGT [120924] ------> TGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTC [120925] ------> GCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCA [120926] ------> CGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAA [120927] ------> GGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAAC [120928] ------> GAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACT [120929] ------> AGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTG [120930] ------> GACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGC [120931] ------> ACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCC [120932] ------> CCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCT [120933] ------> CTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTG [120934] ------> TTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGG #12 Error: can not add GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA last objects: [120919] ------> ACTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGA [120920] ------> CTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAA [120921] ------> TGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAG [120922] ------> GCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGG [120923] ------> CTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGT [120924] ------> TGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTC [120925] ------> GCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCA [120926] ------> CGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAA [120927] ------> GGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAAC [120928] ------> GAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACT [120929] ------> AGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTG [120930] ------> GACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGC [120931] ------> ACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCC [120932] ------> CCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCT [120933] ------> CTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTG [120934] ------> TTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGG #15 Error: can not add ATGTTTATAAGTACTCAGAAATTCTCTGTCTAGGAGTCCGTCTTCCCAGCT last objects: [120734] ------> CTGTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATG [120735] ------> TGTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGT [120736] ------> GTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTA [120737] ------> TTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTAT [120738] ------> TATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATT [120739] ------> ATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTC [120740] ------> TTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCA [120741] ------> TCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAG [120742] ------> CAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGA [120743] ------> AGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAG [120744] ------> GTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGA [120745] ------> TAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAG [120746] ------> AAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGC [120747] ------> AGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCG [120748] ------> GTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGG [120749] ------> TCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGGC
with this fix, #141 is duplicate
Scaffolds >= 500 nt Number: 44 Total length: 4213877 Average: 95769 N50: 176987 Median: 69430 Largest: 337729
ef8ff04e9ea6df75d06ec6408a23cc8125aa5835
[r107-n70:06763] 31 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [r107-n70:06763] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages Ray: code/plugin_MessageProcessor/MessageProcessor.cpp:1632: void MessageProcessor::call_RAY_MPI_TAG_SAVE_WAVEPROGRESSION(Message): Assertion `node!=null' failed. [r105-n87:04965] _\ Process received signal *** [r105-n87:04965] Signal: Aborted (6) [r105-n87:04965] Signal code: (-6) [r105-n87:04965] [ 0] /lib64/libpthread.so.0 [0x7f5e3ca98be0] [r105-n87:04965] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x7f5e3c763285] [r105-n87:04965] [ 2] /lib64/libc.so.6(abort+0x110) [0x7f5e3c764d30] [r105-n87:04965] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x7f5e3c75c706] [r105-n87:04965] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x2b3) [0x4c0453] [r105-n87:04965] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x27) [0x4c0577] [r105-n87:04965] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x582754] [r105-n87:04965] [ 7] Ray(_ZN11ComputeCore3runEv+0x6b) [0x586dbb] [r105-n87:04965] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4705ee] [r105-n87:04965] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24b) [0x46e27b] [r105-n87:04965] [10] Ray(main+0xc7) [0x46e4f7] [r105-n87:04965] [11] /lib64/libc.so.6(libc_start_main+0xf4) [0x7f5e3c750994] [r105-n87:04965] [12] Ray(_ZNSt8ios_base4InitD1Ev+0x51) [0x46b569]
[r105-n87:04965] * End of error message *
mpiexec noticed that process rank 51 with PID 4965 on node r105-n87 exited on signal 6 (Aborted).