qzan9 / osu-micro-benchmarks

Add HIP support to test GPUDirect capability of Hygon DCU
Other
4 stars 1 forks source link

latency D2D error #1

Open leonf88 opened 5 years ago

leonf88 commented 5 years ago

Hi,您好,在执行 osu_bw 和 osu_latency 的时候发现在 Device 会出现如下错误,在 Host 的时候并不会,请问应该怎么做呢?

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
        ./get_local_rank ./mpi/pt2pt/osu_bw D D
# OSU MPI-CUDA Bandwidth Test v5.5-azq
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid: 12814) terminated with signal 11 -> abort job
[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.2.5.141 aborted: Error while reading a PMI socket (4)
» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.143 \
        ./get_local_rank ./mpi/pt2pt/osu_latency D D

# OSU MPI-CUDA Latency Test v5.5-azq
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size          Latency (us)
0                       1.33
[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid: 12308) terminated with signal 11 -> abort job
[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.2.5.141 aborted: Error while reading a PMI socket (4)
[dell-gpu143:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dell-gpu143:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dell-gpu143:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI process died?
qzan9 commented 4 years ago

Hi,您好,在执行 osu_bw 和 osu_latency 的时候发现在 Device 会出现如下错误,在 Host 的时候并不会,请问应该怎么做呢?

哪家的GPU?