Closed zer011b closed 6 years ago
How do you compile?
Recompile with -DPRINT_MESSAGE=ON. Possibly there will be an assert message with additional information.
In case you use xyz topology during compilation, it seems like non-xyz virtual topology was chosen during execution. You can manually specify topology --manual-topology --topology-sizex 2 --topology-sizey 2 --topology-sizez 2
to not deal with these issues.
This is my cmake command: cmake .. -DCMAKE_BUILD_TYPE=Release -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=OFF -DTIME_STEPS=2 -DPARALLEL_GRID_DIMENSION=3 -DPRINT_MESSAGE=ON -DPARALLEL_GRID=ON -DPARALLEL_BUFFER_DIMENSION=xyz -DCXX11_ENABLED=ON -DCUDA_ENABLED=OFF -DCUDA_ARCH_SM_TYPE=sm_50
output message:
cem@ubuntu:~/Desktop/fdtd3d/build/Source$ mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 110 Nodes' grid (OPTIMAL): 2x2x2.
NOTE: you use OPTIMAL virtual topology (2x2x2). OPTIMAL virtual topology has some requirements in order for it to be optimal:
Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x1be3280, count=8250, MPI_DOUBLE, src=1, tag=1, comm=0x84000004, status=0x7ffe7aab74b0) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x7f0240, count=8250, MPI_DOUBLE, src=3, tag=3, comm=0x84000002, status=0x7ffd79339860) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x1d2a2d0, count=8250, MPI_DOUBLE, src=5, tag=5, comm=0x84000002, status=0x7fff4f7eb820) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x19ff1f0, count=8250, MPI_DOUBLE, src=7, tag=7, comm=0x84000002, status=0x7ffeb08d5d50) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process
I have tested this build and execution, and It works fine for me. Could you post information about version of your MPI library and compiler?
I have tested with opemmpi, it is work now. My previous mpi is intel mpi, may be it is not compatible with intel mpi
With adding this command --manual-topology --topology-sizex 2 --topology-sizey 2 --topology-sizez 2 intel mpi is working fine now, thank you indeed.
Thank you for this information, I will check build with intel mpi.
I am running fdtd3d with command mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 110
the output message show below: Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x1fd72c0, count=8250, MPI_DOUBLE, src=1, tag=1, comm=0x84000004, status=0x7ffe15d46ee0) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x778290, count=8250, MPI_DOUBLE, src=3, tag=3, comm=0x84000002, status=0x7ffe5bfe26f0) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x2517270, count=8250, MPI_DOUBLE, src=5, tag=5, comm=0x84000002, status=0x7ffc28808580) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x24de260, count=8250, MPI_DOUBLE, src=7, tag=7, comm=0x84000002, status=0x7ffe92f4a090) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process
I have tested it is OK for this command: mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 100
Same grid size in three dim will be OK, I think it is maybe the mpi implement issue.