lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

Staggered fermion domain decomposition improvements #14

Closed maddyscientist closed 10 years ago

maddyscientist commented 13 years ago

The following additions are required to allow testing of new preconditioners and domain-decomposition algorithms for improved-staggered actions: 1.) Fix half precision for asqtad fermions, the multi-gpu code is currently broken. Half precision is ideal for use as a DD preconditioner since low accuracy is sufficient, i.e., ~0.1. 2.) Naive staggered fermion kernels, i.e., nearest neighbour operator only. The idea here is to use the naive staggered action with the fat-link gauge field to precondition the asqtad operator. This should yield much better scaling. 3.) Have the option to switch off communications for the dslash, i.e., apply the dslash operator to all sites in a node, but do not include the contributions from neigbouring GPUs. Ideally, the ability to switch off both the 1-hop and 3-hop communications, or just the 1-hop term would yield maximum flexibility.

gshi commented 13 years ago

update: 1) is done now 2) is not yet supported 3) is supported but we need to define the interface to use it. The switch on/off for 1 and 3 hop is not supported for the moment. Again, how do u plan to call that? I can shape the code according to how you plan to use it.

maddyscientist commented 13 years ago

Thanks for implementing 1.), and 2.) can wait anyway.

For 3.) I want to be able to have override switches in each Dirac operator instance that we create. So for Wilson in my DD branch I have a commDim[4] array which can be used to override whether spinors comms are enabled in each dimension. For asqtad we'd want two arrays, one which specifies if the comms for the 1-hop spinor is included, and one that specifies whether if both 1- and 3-hop are done. Can this be done at runtime currently, or would this require too much work? For a first pass we could just have one array which specifies whether the both 1- and 3-hop comms are done.

The current solvers have two Dirac operators instances, the default one (high precision) and the sloppy one (low precision). For the DD algorithms we introduce a third Dirac operator instance, a "Preconditioner", which will disable the comms and is used as an inner solver.

gshi commented 13 years ago

the first pass should be pretty easy to do; separating the 1- and 3-hop terms require a little bit more work but should be doable too

maddyscientist commented 10 years ago

Closing this issue, as it is long since completed.