Compact Exponential Cloverterm on GPU

This patch speeds up the construction of the exponential Clover term in the compact layout on GPU architectures. The exponentiation is now performed on the accelerator and the inverse is obtained by computing $\exp(-Clover)$ instead of performing an explicit matrix inversion. In all test cases I looked at the constructor is now dominated by calls to fillCloverYZ, etc.

For the standard clover term and for the exponential clover term with non-periodic boundary condition the inverse is still computed explicitly on the CPU. We could consider using Eigen on the GPU for this operation as discussed yesterday but I did not look into the compatibility of Eigen 3.4 and various CUDA versions, yet.

paboyle / Grid

Compact Exponential Cloverterm on GPU #414