This PR adds the ability to handle the sink projection part of the Laph workflow:
take a set of quark fields, a set of Laplace eigenvectors, and compute the 3-d spatial projection between these
presently expects the quark fields are Wilson-like (nSpin=4) and the eigenvectors are 3-d Laplace (nSpin=1)
rudimentary tiling optimization has been implemented, where we tile both the host->device vector download, and also tile at the kernel level
a full OMP parallel test code has been included, and multiple tests predefined
some changes have been to the colorspinor::FloatNOrder accessor to better support multi-RHS, like the fine-grained accessor, we now have the ability to enable the accessor without ghost-zone support to reduce the kernel argument footprint
a bug was found with the block size settings with TunableMultiReduction where an invalid grid size could be chosen if the number of items being reduced was too large
This PR is ready for review. @walkloud and @andrewhanlon, can you confirm that this code is working as expected from the calling code? @weinbe2 can you review from the QUDA point of view?
This PR adds the ability to handle the sink projection part of the Laph workflow:
colorspinor::FloatNOrder
accessor to better support multi-RHS, like the fine-grained accessor, we now have the ability to enable the accessor without ghost-zone support to reduce the kernel argument footprintTunableMultiReduction
where an invalid grid size could be chosen if the number of items being reduced was too large