sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

path storage using blocks has a bug (not in production) #142

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

[r107-n70:06763] 31 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [r107-n70:06763] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages Ray: code/plugin_MessageProcessor/MessageProcessor.cpp:1632: void MessageProcessor::call_RAY_MPI_TAG_SAVE_WAVEPROGRESSION(Message): Assertion `node!=null' failed. [r105-n87:04965] _\ Process received signal *** [r105-n87:04965] Signal: Aborted (6) [r105-n87:04965] Signal code: (-6) [r105-n87:04965] [ 0] /lib64/libpthread.so.0 [0x7f5e3ca98be0] [r105-n87:04965] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x7f5e3c763285] [r105-n87:04965] [ 2] /lib64/libc.so.6(abort+0x110) [0x7f5e3c764d30] [r105-n87:04965] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x7f5e3c75c706] [r105-n87:04965] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x2b3) [0x4c0453] [r105-n87:04965] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x27) [0x4c0577] [r105-n87:04965] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x582754] [r105-n87:04965] [ 7] Ray(_ZN11ComputeCore3runEv+0x6b) [0x586dbb] [r105-n87:04965] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4705ee] [r105-n87:04965] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24b) [0x46e27b] [r105-n87:04965] [10] Ray(main+0xc7) [0x46e4f7] [r105-n87:04965] [11] /lib64/libc.so.6(libc_start_main+0xf4) [0x7f5e3c750994] [r105-n87:04965] [12] Ray(_ZNSt8ios_base4InitD1Ev+0x51) [0x46b569]

[r105-n87:04965] * End of error message *

mpiexec noticed that process rank 51 with PID 4965 on node r105-n87 exited on signal 6 (Aborted).

sebhtml commented 11 years ago

reproducible

Ray: code/plugin_MessageProcessor/MessageProcessor.cpp:1637: void MessageProcessor::call_RAY_MPI_TAG_SAVE_WAVEPROGRESSION(Message): Assertion `node!=null' failed. Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC[cp2061:01397] _\ Process received signal *** [cp2061:01397] Signal: Aborted (6) [cp2061:01397] Signal code: (-6) [cp2061:01397] [ 0] /lib64/libpthread.so.0() [0x305840f490] [cp2061:01397] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3058032945] [cp2061:01397] [ 2] /lib64/libc.so.6(abort+0x175) [0x3058034125] [cp2061:01397] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x305802b955] [cp2061:01397] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x304) [0x4c5464] [cp2061:01397] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x19) [0x4c55b9] [cp2061:01397] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x58edb4] [cp2061:01397] [ 7] Ray(_ZN11ComputeCore3runEv+0x5c) [0x59359c] [cp2061:01397] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4725fe] [cp2061:01397] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x4700ea] [cp2061:01397] [10] Ray(main+0xc7) [0x470387] [cp2061:01397] [11] /lib64/libc.so.6(libc_start_main+0xfd) [0x305801ec9d] [cp2061:01397] [12] Ray() [0x46c231] [cp2061:01397] * End of error message *

sebhtml commented 11 years ago

reproducible from checkpoints in 5 min on 24 cores !

Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC[cp2061:04959] Signal: Aborted (6) [cp2061:04959] Signal code: (-6) [cp2061:04959] [ 0] /lib64/libpthread.so.0() [0x305840f490] [cp2061:04959] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3058032945] [cp2061:04959] [ 2] /lib64/libc.so.6(abort+0x175) [0x3058034125] [cp2061:04959] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x305802b955] [cp2061:04959] [ 4] Ray(_ZN16MessageProcessor38call_RAY_MPI_TAG_SAVE_WAVE_PROGRESSIONEP7Message+0x304) [0x4c5464] [cp2061:04959] [ 5] Ray(_ZN52Adapter_RAY_MPI_TAG_SAVE_WAVE_PROGRESSION_WITH_REPLY4callEP7Message+0x19) [0x4c55b9] [cp2061:04959] [ 6] Ray(_ZN11ComputeCore10runVanillaEv+0xf4) [0x58edb4] [cp2061:04959] [ 7] Ray(_ZN11ComputeCore3runEv+0x5c) [0x59359c] [cp2061:04959] [ 8] Ray(_ZN7Machine5startEv+0x135e) [0x4725fe] [cp2061:04959] [ 9] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x4700ea] [cp2061:04959] [10] Ray(main+0xc7) [0x470387] [cp2061:04959] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x305801ec9d] [cp2061:04959] [12] Ray() [0x46c231]

[cp2061:04959] * End of error message *

sebhtml commented 11 years ago

checkpoint restart with default storage CONFIG_PATH_STORAGE_DEFAULT (Sample_CQDM2-3-Ray-11): PASS checkpoint restart with block storage CONFIG_PATH_STORAGE_BLOCK (Sample_CQDM2-3-Ray-15): FAIL

with not-found object AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC

sebhtml commented 11 years ago

[boisver1@cp0869 boisver1]$ grep AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC Sample_CQDM2-3-Ray-16.1.* Sample_CQDM2-3-Ray-16.1.09:Error: vertex does not exist: AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC Sample_CQDM2-3-Ray-16.1.12:[GraphPath::readObjectInBlock] returns AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC

so the object is read

sebhtml commented 11 years ago

[GraphPath::readObjectInBlock] returns AGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGACTGTAATACC position: 120945 blocks: 173 vertices: 176284 kmerLength: 51 blockNumber: 118 positionInBlock: 113

thie object is inside 1 block.

sebhtml commented 11 years ago

problem is in write, not read

Ray: code/pluginSeedingData/GraphPath.cpp:391: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed.

sebhtml commented 11 years ago

Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. Error: expected: GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA actual: TGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGA at position120935 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096

sebhtml commented 11 years ago

15

Error: expected: ATGTTTATAAGTACTCAGAAATTCTCTGTCTAGGAGTCCGTCTTCCCAGCT actual: CCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGGCT at position120750 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096 Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. [cp0869:18033] * Process received signal * [cp0869:18033] Signal: Aborted (6) [cp0869:18033] Signal code: (-6) [cp0869:18033] [ 0] /lib64/libpthread.so.0() [0x3a0b40f490] [cp0869:18033] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3a0b032945] [cp0869:18033] [ 2] /lib64/libc.so.6(abort+0x175) [0x3a0b034125] [cp0869:18033] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3a0b02b955] [cp0869:18033] [ 4] Ray(_ZN9GraphPath18writeObjectInBlockEP4Kmer+0x74e) [0x55134e] [cp0869:18033] [ 5] Ray(_ZN12JoinerWorker4workEv+0x32a0) [0x4a8200] [cp0869:18033] [ 6] Ray(_ZN16VirtualProcessor3runEv+0xcb) [0x5a058b] [cp0869:18033] [ 7] Ray(_ZN11TaskCreator8mainLoopEv+0x25) [0x58dfa5] [cp0869:18033] [ 8] Ray(_ZN11ComputeCore10runVanillaEv+0x133) [0x58f933] [cp0869:18033] [ 9] Ray(_ZN11ComputeCore3runEv+0x5c) [0x5940dc] [cp0869:18033] [10] Ray(_ZN7Machine5startEv+0x135e) [0x47261e] [cp0869:18033] [11] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x47010a] [cp0869:18033] [12] Ray(main+0xc7) [0x4703a7] [cp0869:18033] [13] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a0b01ec9d] [cp0869:18033] [14] Ray() [0x46c249] [cp0869:18033] * End of error message *

4

Ray: code/pluginSeedingData/GraphPath.cpp:397: void GraphPath::writeObjectInBlock(Kmer): Assertion `(_a)==addedObject' failed. Error: expected: GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA actual: TGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGGA at position120935 kmerLength: 51 CONFIG_PATH_BLOCK_SIZE 4096 [cp0869:18022] * Process received signal * [cp0869:18022] Signal: Aborted (6) [cp0869:18022] Signal code: (-6) [cp0869:18022] [ 0] /lib64/libpthread.so.0() [0x3a0b40f490] [cp0869:18022] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3a0b032945] [cp0869:18022] [ 2] /lib64/libc.so.6(abort+0x175) [0x3a0b034125] [cp0869:18022] [ 3] /lib64/libc.so.6(__assert_fail+0xf5) [0x3a0b02b955] [cp0869:18022] [ 4] Ray(_ZN9GraphPath18writeObjectInBlockEP4Kmer+0x74e) [0x55134e] [cp0869:18022] [ 5] Ray(_ZN12JoinerWorker4workEv+0x32a0) [0x4a8200] [cp0869:18022] [ 6] Ray(_ZN16VirtualProcessor3runEv+0xcb) [0x5a058b] [cp0869:18022] [ 7] Ray(_ZN11TaskCreator8mainLoopEv+0x25) [0x58dfa5] [cp0869:18022] [ 8] Ray(_ZN11ComputeCore10runVanillaEv+0x133) [0x58f933] [cp0869:18022] [ 9] Ray(_ZN11ComputeCore3runEv+0x5c) [0x5940dc] [cp0869:18022] [10] Ray(_ZN7Machine5startEv+0x135e) [0x47261e] [cp0869:18022] [11] Ray(_ZN11RankProcessI7MachineE3runEv+0x24a) [0x47010a] [cp0869:18022] [12] Ray(main+0xc7) [0x4703a7] [cp0869:18022] [13] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a0b01ec9d] [cp0869:18022] [14] Ray() [0x46c249] [cp0869:18022] * End of error message *

sebhtml commented 11 years ago

lol this is caused by the fact that this path is wrong...

sebhtml commented 11 years ago

code is fine, de Bruijn graph property is not respected here.

sebhtml commented 11 years ago

this is cool because the new block storage enforces de Bruijn links, thus even less assembly oddities

sebhtml commented 11 years ago
#4
Error: can not add GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA
last objects:
 [120919] ------> ACTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGA
 [120920] ------> CTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAA
 [120921] ------> TGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAG
 [120922] ------> GCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGG
 [120923] ------> CTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGT
 [120924] ------> TGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTC
 [120925] ------> GCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCA
 [120926] ------> CGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAA
 [120927] ------> GGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAAC
 [120928] ------> GAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACT
 [120929] ------> AGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTG
 [120930] ------> GACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGC
 [120931] ------> ACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCC
 [120932] ------> CCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCT
 [120933] ------> CTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTG
 [120934] ------> TTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGG
#12
Error: can not add GACGGAATGGGAGACTTCCTCGGAAACATCCGCAAGATGGTTCTGGAAGAA
last objects:
 [120919] ------> ACTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGA
 [120920] ------> CTGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAA
 [120921] ------> TGCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAG
 [120922] ------> GCTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGG
 [120923] ------> CTGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGT
 [120924] ------> TGCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTC
 [120925] ------> GCGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCA
 [120926] ------> CGGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAA
 [120927] ------> GGAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAAC
 [120928] ------> GAGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACT
 [120929] ------> AGACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTG
 [120930] ------> GACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGC
 [120931] ------> ACCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCC
 [120932] ------> CCTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCT
 [120933] ------> CTTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTG
 [120934] ------> TTGAGGACTGCAGGAAGAACCTGCTGAGGAACAAGAAGGTCAACTGCCTGG
#15
Error: can not add ATGTTTATAAGTACTCAGAAATTCTCTGTCTAGGAGTCCGTCTTCCCAGCT
last objects:
 [120734] ------> CTGTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATG
 [120735] ------> TGTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGT
 [120736] ------> GTTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTA
 [120737] ------> TTATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTAT
 [120738] ------> TATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATT
 [120739] ------> ATTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTC
 [120740] ------> TTCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCA
 [120741] ------> TCAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAG
 [120742] ------> CAGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGA
 [120743] ------> AGTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAG
 [120744] ------> GTAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGA
 [120745] ------> TAAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAG
 [120746] ------> AAGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGC
 [120747] ------> AGTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCG
 [120748] ------> GTCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGG
 [120749] ------> TCCGCCGCTGCCAGTGACATCCGTCTTCGCCTGATGTATTCAGAGAGCGGC
sebhtml commented 11 years ago

with this fix, #141 is duplicate

Scaffolds >= 500 nt Number: 44 Total length: 4213877 Average: 95769 N50: 176987 Median: 69430 Largest: 337729

sebhtml commented 11 years ago

ef8ff04e9ea6df75d06ec6408a23cc8125aa5835