hpcg-benchmark / hpcg

Official HPCG benchmark source code
http://www.hpcg-benchmark.org/
BSD 3-Clause "New" or "Revised" License
298 stars 125 forks source link

HPCG Cuda Binary with MPI support not working properly for multiple hosts #65

Open Pl4tiNuM opened 4 years ago

Pl4tiNuM commented 4 years ago

Hello,

I am trying to run HPCG with cuda support using MPI on multiple hosts. Specifically, I use the binary found in the website (https://www.hpcg-benchmark.org/software/view.html?id=267).

I have setup my cluster with all the required libraries and am able to run the benchmark on one node. The problem is when I try to use multiple MPI hosts. Instructions say that in order to run with multiple hosts (e.g. 2 hosts with 2 GPUs each), we have to issue a command like below:

mpirun -np 4 -hostfile hosts2 ./xhpcg-3.1_gcc_485_cuda-10.0.130_ompi-3.1.0_sm_35_sm_50_sm_60_sm_70_sm_75_ver_10_9_18

where hosts2 looks like this:

mpi-worker-0
mpi-worker-1

However, when I issue the command like the above, all the processes (4) are deployed on the first MPI host found in the hosts2 file (i.e. mpi-worker-0 in this case) and none is deployed on the second one.

Is there anything I can do?

Thanks in advance, Dimosthenis

viniciusferrao commented 3 years ago

Just add slots=2 after each line on the hosts file.