Closed TommyUW closed 1 year ago
What exactly does "However, this command doesn't work and the terminal shows that only 2 GPUs are available. How should I change my command?" mean ?
based on your description, each of your MPI nodes should indeed have 2 GPUs.
As for your 2nd question, I would advise to first try without setting any variable, and with more than 1 process.
When using MPI, you have 1 StarPU process running on each node, and each StarPU process only sees the GPU devices on the node it is running on. So what you are asking is not possible.
When using MPI, you have 1 StarPU process running on each node, and each StarPU process only sees the GPU devices on the node it is running on. So what you are asking is not possible.
I have changed the content of mpi_config_file. It is now node1:1, node2:1 It means each machine utilize 1 CPU core. Now I am able to use one GPU from each machine to run the LU program. However, the performance is low and not all of the GPUs are used. In short, how can I change my command to utilize four GPUs on these two machines to run the program?
Also, I just checked the GPU usage. It is very low. Is it because of core oversubscription when running LU on multiple machines? Moreover, even if I set NCUDA as 0, the performance become very low again.
Do no set any environment variable, StarPU will use all the GPUs on each node.
And please, make sure your text message is consistent with the image you put below. You said you set NCUDA to 0 but in the image it says STARPU_NCUDA=2
And you should also use all the CPUs on the nodes, not just only one.
I am so sorry about the mistake. I do not add any variables this time. Yes, the StarPU has used all the GPUs. However, the performance is still very low. I realize that my problem is not the GPU but running mpi on multiple machines. As shown in the first picture, when I run the MPI StarPU LU on a single machine with the command that I input, the performance increases. However, in the second picture if I do not add any variables or use the same command, the performance become significantly lower. The reason that I add these variables is to keep StarPU from using all the cores on CPU. I remember you said that the program with MPI will experience core oversubscription as StarPU will use all the cores automatically. So, to increase the performance of the LU program on multiple machines, what command should I input exactly? Thank you very much.
Also, please take a look at this picture. I set NCUDA=0. The performance of this program on single machine with 2 processes is higher than on multiple machines.
As i said before, when running MPI, you will get the best performances when running 1 process on 1 node, assuming the nodes are connected through a high bandwidth network. You should talk to the persons managing your cluster, and see how to get the best performances with MPI.
As i said before, when running MPI, you will get the best performances when running 1 process on 1 node, assuming the nodes are connected through a high bandwidth network. You should talk to the persons managing your cluster, and see how to get the best performances with MPI.
So in short, I can only use two processes with MPI in order to utilize the total four GPUs, correct? To get the best performance, each process connect with one machine, using all the cores on CPUs through StarPU. Besides, it is impossible for me to add multiple processes on two machines to increase my scalability, right?
The number of MPI processes has nothing to do with the GPUs. I just said the GPUs are only visible to the process running on the machine. I will close the issue as your problems with MPI are not related to StarPU.
Hello, Thank you for your previous help. Currently the the example code of LU MPI with StarPU and CUDA is able to run on two machines, each is equipped with two GPUS. I am trying to run this program on these two machines simultaneously so that four GPUs can be utilized. Here is my command: STARPU_SCHED=dmda STARPU_NCPU=1 OPENBLAS_NUM_THREADS=1 STARPU_WORKERS_NOBIND=1 STARPU_NCUDA=4 STARPU_NOPENCL=0 mpirun -n 4 -f mpi_config_file ./plu_example_double 8 -size 4096 -nblocks 16 -p 2 -q 2
The mpi_config_file is written with: node1:2 node2:2 However, this command doesn't work and the terminal shows that only 2 GPUs are available. How should I change my command?
Besides, on my laptop, I have encountered another interesting thing: As shown in the picture, the program is able to run. However, it seems like the program just stuck there. I have waited for five minutes but still no results. My CUDA is 9.1 and the driver is 530. Is it because the version of the driver is too high?
Thank you very much