Solo precision GCR does not work properly when preconditioned with GCR. It was detected during staggered multigrid tests when the fine level pre/post-smoothers were set to GCR. However, it can be reproduced in the develop branch with wilson operator, please see details below.
How to reproduce the bug.
The fastest way is to run multigrid solver. Note that the original GCR solver does not allow one to choose GCR as a preconditoner, but MG preconditioner can be used with GCR smoothers that essentially enables this option. Execute, e.g, the following commands :
./multigrid_invert_test --prec single --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6
and
./multigrid_invert_test --prec double --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6
The first run (solo single precision) will give something like:
Reconstructed: CUDA solution = 589344, CPU copy = 589344
Residuals: (L2 relative) tol 1e-07, QUDA = 1.00691, host = 1.28086
while the second run (mixed precision) gives:
Reconstructed: CUDA solution = 6.61448e+08, CPU copy = 6.61448e+08
Residuals: (L2 relative) tol 1e-07, QUDA = 5.74684e-08, host = 7.31042e-08
Remark 1: This behavior can be seen for solo double precision execution. To enable double precision MG one needs to set HOST_DEBUG=yes. In the current version, a compiler won't produce double precision code correctly because pre-processor flag GPU_MULTIGRID_DOUBLE defined in multigrid.h is not visible in all necessary source files. To bypass this problem, you may enforce this flag as a compiler option in make.inc:
Remark 2: during execution you may see the following error message:
GCR: 1 iterations, <r,r> = nan, |r|/|b| = nan
ERROR: Solver appears to have diverged (rank 0, host gtx-titan.fnal.gov, solver.cpp:160 in PrintStats())
To fix this issue you may need to replace line 241 in inv_gcr_quda.cpp with QUDA_ZERO_FIELD_CREATE.
Remark 3: you may get segfault when exit MG solver due to GCR destructor, to fix this problem add conditioned deallocation in the GCR destructor. i.e.:
if (init){
if (param.precision_sloppy != param.precision) {
delete x_sloppy;
delete r_sloppy;
}
...
}
Remark 4 : you may disable the coarse level solver at all ,e.g. , by commenting proper part of the multigrid preconditioner.
What happens?
In solo precision, in the GCR solver the search vector p[0] after the first preconditioned iteration is set to zero. You can observe it by monitoring the norm of p[0] before and after application of the preconditioner in the top level solver, e.g.:
printfQuda("\np[0] value before precond %1.15e : address : %p\n", blas::norm2(*p[0]), p[0]);
pushVerbosity(param.verbosity_precondition);
if ((parity+m)%2 == 0 || param.schwarz_type == QUDA_ADDITIVE_SCHWARZ) (*K)(pPre, rPre);
else blas::copy(pPre, rPre);
popVerbosity();
printfQuda("\np[0] value after precond %1.15e (%1.15le) : address : %p\n", blas::norm2(*p[0]), p[0]);
3. Very dirty hack
For solo precision GCR, create an intermediate buffer and keep the value of p[0] in the buffer. After each application of the preconditioner, reconstruct p[0] with the intermediate buffer.
Forgot to add my environment:
gcc version: 4.8.4
nvcc version: 8.0
OS kernel : 2.6.32-642.4.2.el6.x86_64 (Scientific Linux v 6.3)
driver version: 367.44
Solo precision GCR does not work properly when preconditioned with GCR. It was detected during staggered multigrid tests when the fine level pre/post-smoothers were set to GCR. However, it can be reproduced in the develop branch with wilson operator, please see details below.
./multigrid_invert_test --prec single --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6
and./multigrid_invert_test --prec double --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6
The first run (solo single precision) will give something like:
while the second run (mixed precision) gives:
Remark 1: This behavior can be seen for solo double precision execution. To enable double precision MG one needs to set HOST_DEBUG=yes. In the current version, a compiler won't produce double precision code correctly because pre-processor flag GPU_MULTIGRID_DOUBLE defined in multigrid.h is not visible in all necessary source files. To bypass this problem, you may enforce this flag as a compiler option in make.inc:
Remark 2: during execution you may see the following error message:
To fix this issue you may need to replace line 241 in
inv_gcr_quda.cpp
withQUDA_ZERO_FIELD_CREATE
.Remark 3: you may get segfault when exit MG solver due to GCR destructor, to fix this problem add conditioned deallocation in the GCR destructor. i.e.:
Remark 4 : you may disable the coarse level solver at all ,e.g. , by commenting proper part of the multigrid preconditioner.
In solo precision, in the GCR solver the search vector p[0] after the first preconditioned iteration is set to zero. You can observe it by monitoring the norm of p[0] before and after application of the preconditioner in the top level solver, e.g.:
For solo precision GCR, create an intermediate buffer and keep the value of p[0] in the buffer. After each application of the preconditioner, reconstruct p[0] with the intermediate buffer.