lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
292 stars 99 forks source link

Communicate tunecache during runs when tuning is active in Multi-GPU runs #199

Open mathiaswagner opened 9 years ago

mathiaswagner commented 9 years ago

When tuning is active during Multi-GPU runs each GPU independently tunes each Kernel. This results in different GPUs using different launch configurations for the final Kernel launch and finally makes binary reproducibility impossible. This was first discovered in #182.

While a simple global reduction over the elapsed time during the tuning can help in synchronous runs it will cause hangs when using asynchronous algorithms like DD where each GPU works on a local problem and may not even launch the tuning process for a specific Kernel.

This then also relates to the issue mentioned in tune.cpp

//FIXME: We should really check to see if any nodes have tuned a kernel that was not also tuned on node 0, since as things
//       stand, the corresponding launch parameters would never get cached to disk in this situation.  This will come up if we
//       ever support different sub volumes per GPU (as might be convenient for lattice volumes that don't divide evenly).

We need a non blocking solution to that.

weinbe2 commented 1 month ago

Updating: this has been addressed in the non-DD case, but is still relevant for DD.