Closed maddyscientist closed 1 year ago
This PR primarily serves to remove the induced stack frame in a number of kernels
__force_inline__
SharedMemoryCache
Some other minor changes:
Matrix
Doesn't seem to break anything on 8xP100 in the tmLQCD HMC.
This PR primarily serves to remove the induced stack frame in a number of kernels
__force_inline__
)SharedMemoryCache
to act as virtual registers (Symanzik improved Wilson-flow and STOUT kernels)Some other minor changes:
Matrix
functions