I have setup my cluster with all the required libraries and am able to run the benchmark on one node. The problem is when I try to use multiple MPI hosts. Instructions say that in order to run with multiple hosts (e.g. 2 hosts with 2 GPUs each), we have to issue a command like below:
However, when I issue the command like the above, all the processes (4) are deployed on the first MPI host found in the hosts2 file (i.e. mpi-worker-0 in this case) and none is deployed on the second one.
Hello,
I am trying to run HPCG with cuda support using MPI on multiple hosts. Specifically, I use the binary found in the website (https://www.hpcg-benchmark.org/software/view.html?id=267).
I have setup my cluster with all the required libraries and am able to run the benchmark on one node. The problem is when I try to use multiple MPI hosts. Instructions say that in order to run with multiple hosts (e.g. 2 hosts with 2 GPUs each), we have to issue a command like below:
where hosts2 looks like this:
However, when I issue the command like the above, all the processes (4) are deployed on the first MPI host found in the
hosts2
file (i.e.mpi-worker-0
in this case) and none is deployed on the second one.Is there anything I can do?
Thanks in advance, Dimosthenis