Kernel fusion for 4-d preconditioned domain-wall and Mobius kernels

lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.

https://lattice.github.io/quda

Other

289 stars 97 forks source link

Kernel fusion for 4-d preconditioned domain-wall and Mobius kernels #183

Open maddyscientist opened 9 years ago

maddyscientist commented 9 years ago

The 4-d preconditioned chiral fermion dslashes presently use multiple kernels to apply the preconditioned dslash. This is an issue to explore fusing these kernels, the motivation is two fold:

This will improve single node performance since it will reduce memory traffic.
If the 4-d dslash part is fused with other operations it will make the strong scaling better since there will be more compute to overlap with.

weinbe2 commented 2 weeks ago

Mobius is covered by https://github.com/lattice/quda/pull/1163 , I guess 4-d preconditioned Shamir is still outstanding