lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
287 stars 94 forks source link

Auto-tuning framework changes #145

Open maddyscientist opened 10 years ago

maddyscientist commented 10 years ago

The auto-tuning framework has to be modified for CUDA 6.5 to allow for compatibility with future GPUs. The present auto tuner tests all launch configurations, regardless of whether they are valid or not. This seems to cause problems on an unreleased GPU, so I am going to modify the framework such that these invalid launches are skipped.

To do so, we will utilize the occupancy calculator API that is included with CUDA 6.5: the function cudaOccupancyMaxPotentialBlockSize will compute maximum possible block size for a given kernel (e.g., given the number of registers it consumes). For the moment, I think the easiest way to do this is to have all derived classes of Tunable define a new method that computes this. This will then be used when tuning to ensure this limit is not exceeded. E.g., here's the code I presently use for blasKernel"

 int maxThreadsPerBlock() const {
   int minGridSize, maxBlockSize;
   cudaOccupancyMaxPotentialBlockSize(&minGridSize, &maxBlockSize, blasKernel<FloatN,M,SpinorX,SpinorY,SpinorZ,SpinorW,Functor>, 0, 0);
   return maxBlockSize;
 }

I will make a global edit on all the derived classes from Tunable next week, implementing this. At which point all branches that relate to the quda-0.7 branch should be updated. This will be a design rule that all classes will have to obey.

maddyscientist commented 10 years ago

I note that ideally we would do this querying in the parent Tunable class, but to do so we need the name of the function in the parent to this. Given the fractured state of kernel launches between C++ and C macros at the moment, this will have to be in a subsequent clean up.