zer011b / fdtd3d

fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x64, ARM, ARM64, RISC-V, PowerPC architectures
GNU General Public License v2.0
119 stars 33 forks source link

Fix some memory issues #94

Closed zer011b closed 6 years ago

EMinsight commented 6 years ago

I am running fdtd3d with command mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 110

the output message show below: Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x1fd72c0, count=8250, MPI_DOUBLE, src=1, tag=1, comm=0x84000004, status=0x7ffe15d46ee0) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x778290, count=8250, MPI_DOUBLE, src=3, tag=3, comm=0x84000002, status=0x7ffe5bfe26f0) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x2517270, count=8250, MPI_DOUBLE, src=5, tag=5, comm=0x84000002, status=0x7ffc28808580) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(224)...................: MPI_Recv(buf=0x24de260, count=8250, MPI_DOUBLE, src=7, tag=7, comm=0x84000002, status=0x7ffe92f4a090) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process

I have tested it is OK for this command: mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 100

Same grid size in three dim will be OK, I think it is maybe the mpi implement issue.

zer011b commented 6 years ago

How do you compile?

Recompile with -DPRINT_MESSAGE=ON. Possibly there will be an assert message with additional information.

In case you use xyz topology during compilation, it seems like non-xyz virtual topology was chosen during execution. You can manually specify topology --manual-topology --topology-sizex 2 --topology-sizey 2 --topology-sizez 2 to not deal with these issues.

EMinsight commented 6 years ago

This is my cmake command: cmake .. -DCMAKE_BUILD_TYPE=Release -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=OFF -DTIME_STEPS=2 -DPARALLEL_GRID_DIMENSION=3 -DPRINT_MESSAGE=ON -DPARALLEL_GRID=ON -DPARALLEL_BUFFER_DIMENSION=xyz -DCXX11_ENABLED=ON -DCUDA_ENABLED=OFF -DCUDA_ARCH_SM_TYPE=sm_50

output message:

cem@ubuntu:~/Desktop/fdtd3d/build/Source$ mpiexec -n 8 ./fdtd3d --3d --parallel-grid --time-steps 100 --sizex 100 --sizey 100 --sizez 110 Nodes' grid (OPTIMAL): 2x2x2.

NOTE: you use OPTIMAL virtual topology (2x2x2). OPTIMAL virtual topology has some requirements in order for it to be optimal:

zer011b commented 6 years ago

I have tested this build and execution, and It works fine for me. Could you post information about version of your MPI library and compiler?

EMinsight commented 6 years ago

I have tested with opemmpi, it is work now. My previous mpi is intel mpi, may be it is not compatible with intel mpi

EMinsight commented 6 years ago

With adding this command --manual-topology --topology-sizex 2 --topology-sizey 2 --topology-sizez 2 intel mpi is working fine now, thank you indeed.

zer011b commented 6 years ago

Thank you for this information, I will check build with intel mpi.