sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

MPI I/O is buggy sometimes (maybe it is the Ray code ?) #206

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

On corbeil-mp2.rqchp.ca:

sebhtml commented 10 years ago

(gdb) bt
#0  0x0000003d036cf3a7 in sched_yield () from /lib64/libc.so.6
#1  0x00002acd0f472fff in opal_progress () at runtime/opal_progress.c:220
#2  0x00002acd0f3bf765 in opal_condition_wait (count=2, requests=0x7fff5c6fd0d0, statuses=0x0) at ../opal/threads/condition.h:99
#3  ompi_request_default_wait_all (count=2, requests=0x7fff5c6fd0d0, statuses=0x0) at request/req_wait.c:263
#4  0x00002acd145f511f in ompi_coll_tuned_allreduce_intra_recursivedoubling (sbuf=<value optimized out>, rbuf=0x7fff5c6fd1bc, count=1, 
    dtype=0x2acd0f6dd420, op=0x2acd0f6f9800, comm=0x861880, module=0x1bd9c00) at coll_tuned_allreduce.c:223
#5  0x00002acd0f3b3b3c in ompi_comm_nextcid (newcomm=0x27998a0, comm=0x861880, bridgecomm=0x0, local_leader=0x0, remote_leader=<value optimized out>, 
    mode=<value optimized out>, send_first=-1) at communicator/comm_cid.c:234
#6  0x00002acd0f3b2406 in ompi_comm_dup (comm=0x861880, newcomm=0x7fff5c6fd2e0) at communicator/comm.c:675
#7  0x00002acd0f3cf3c0 in PMPI_Comm_dup (comm=0x861880, newcomm=0x7fff5c6fd2e0) at pcomm_dup.c:62
#8  0x00002acd300bc551 in mca_io_romio_dist_MPI_File_open (comm=0x861880, filename=0x30641a0 "NormalStool-7/Contigs.fasta", amode=9, info=0x8625e0, 
    fh=0x1dfae60) at open.c:108
#9  0x00002acd0f3f14f6 in module_init (file=0x2f82d40, preferred=<value optimized out>) at base/io_base_file_select.c:442
#10 mca_io_base_file_select (file=0x2f82d40, preferred=<value optimized out>) at base/io_base_file_select.c:214
#11 0x00002acd0f3b9f26 in ompi_file_open (comm=<value optimized out>, filename=0x2d45818 "NormalStool-7/Contigs.fasta", amode=9, info=0x8625e0, 
    fh=0x7fff5c6fd778) at file/file.c:128
#12 0x00002acd0f3e8bd8 in PMPI_File_open (comm=0x861880, filename=0x2d45818 "NormalStool-7/Contigs.fasta", amode=9, info=<value optimized out>, 
    fh=0x7fff5c6fd778) at pfile_open.c:96
#13 0x00000000004d0c43 in MachineHelper::call_RAY_SLAVE_MODE_SEND_EXTENSION_DATA() ()
#14 0x00000000005d84da in ComputeCore::runWithProfiler() ()
#15 0x00000000005d9948 in ComputeCore::run() ()
#16 0x00000000004881c7 in Machine::start() ()
#17 0x0000000000485baa in RankProcess<Machine>::run() ()
#18 0x0000000000485e47 in main ()
sebhtml commented 10 years ago

with MPI_IO=n, let's check if it works.

sebhtml commented 10 years ago

It works without MPI I/O !

sebhtml commented 10 years ago

616d2a26cc1e39f59325a0e632af46262edaa12c