Open maddyscientist opened 11 years ago
Note that this has been partially implemented, at least the logic is in place for a partially pipelined BiCGstab solver (enabled with QudaInvertParam.pipeline=1 from the solver interface). However, this is only a reformulation that reduces the number of reductions to two (from three). Moreover, the actual kernels have not been implemented for this form yet (needs some extension to the blas generators to handle more vector streams.)
Moving to QUDA 0.7.
The I-BiCGstab solver is a reformulation of the BiCGstab solver, such that it only uses a single reduction per iteration. This results in much better parallel scalability since there is only a single synchronization point.
Chroma's implementation of this is here: http://git.jlab.org/cgi-bin/gitweb/gitweb.cgi?p=chroma.git;a=blob;f=lib/actions/ferm/invert/reliable_ibicgstab.cc;h=231e14d4534e9db30651cea861b2238d821b81ac;hb=HEAD