Gpu mpi tests - Githubissues

pecos / tps

Torch Plasma Simulator

BSD 3-Clause "New" or "Revised" License

8 stars 2 forks source link

Gpu mpi tests #39

Closed marc-85 closed 3 years ago

marc-85 commented 3 years ago

Parallel face integration for gradient computation has been fully moved to GPU. An additional structure including all arrays related to face integration on GPU has been created

marc-85 commented 3 years ago

Should we assume that the mpi test will always fail on marvin until we get a second GPU?

trevilo commented 3 years ago

Should we assume that the mpi test will always fail on marvin until we get a second GPU?

I'm not sure. At the moment the problem is the hardcoded number of gpus (const int NUM_GPUS_NODE = 4 inmain.cpp), so the code doesn't run. It might pass if we fixed that, although based on our previous conversations, I guess we expect it to fail. So, for now I can add logic to skip the test if less than two gpus are available.

trevilo commented 3 years ago

I added the logic to skip the mpi test if less than two gpus are available in 1679204, and all tests are now passing, as expected.

One interesting observation: Just for kicks, I changed the number of gpus assumed in main.cpp to const int NUM_GPUS_NODE = 1; locally on marvin and ran make check (with the skip logic disabled) and test 8 in cyl3d.gpu.test passed. I don't know if that is expected or luck or what, but thought you guys might want to know. If it is expected, we can remove the skip test logic if we add something to main.cpp to properly detect the number of gpus.

marc-85 commented 3 years ago

I added the logic to skip the mpi test if less than two gpus are available in 1679204, and all tests are now passing, as expected.

One interesting observation: Just for kicks, I changed the number of gpus assumed in main.cpp to const int NUM_GPUS_NODE = 1; locally on marvin and ran make check (with the skip logic disabled) and test 8 in cyl3d.gpu.test passed. I don't know if that is expected or luck or what, but thought you guys might want to know. If it is expected, we can remove the skip test logic if we add something to main.cpp to properly detect the number of gpus.

Thanks Todd. Detecting the number of GPUs would be ideal. Maybe that is something we could do during the config step and add NUM_GPUS_NODE in the configuration.hpp file?