lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
294 stars 99 forks source link

solo precision preconditioned GCR is broken #505

Closed alexstrel closed 2 months ago

alexstrel commented 8 years ago

Solo precision GCR does not work properly when preconditioned with GCR. It was detected during staggered multigrid tests when the fine level pre/post-smoothers were set to GCR. However, it can be reproduced in the develop branch with wilson operator, please see details below.

  1. How to reproduce the bug. The fastest way is to run multigrid solver. Note that the original GCR solver does not allow one to choose GCR as a preconditoner, but MG preconditioner can be used with GCR smoothers that essentially enables this option. Execute, e.g, the following commands :

./multigrid_invert_test --prec single --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6 and ./multigrid_invert_test --prec double --prec-sloppy single --prec-precondition single --mg-smoother gcr --sdim 16 --tdim 16 --mg-nu-pre 6 --mg-nu-post 6

The first run (solo single precision) will give something like:

Reconstructed: CUDA solution = 589344, CPU copy = 589344
Residuals: (L2 relative) tol 1e-07, QUDA = 1.00691, host = 1.28086

while the second run (mixed precision) gives:

Reconstructed: CUDA solution = 6.61448e+08, CPU copy = 6.61448e+08
Residuals: (L2 relative) tol 1e-07, QUDA = 5.74684e-08, host = 7.31042e-08

Remark 1: This behavior can be seen for solo double precision execution. To enable double precision MG one needs to set HOST_DEBUG=yes. In the current version, a compiler won't produce double precision code correctly because pre-processor flag GPU_MULTIGRID_DOUBLE defined in multigrid.h is not visible in all necessary source files. To bypass this problem, you may enforce this flag as a compiler option in make.inc:

ifeq ($(strip $(BUILD_MULTIGRID)), yes)
  NVCCOPT += -DGPU_MULTIGRID
  COPT += -DGPU_MULTIGRID
  ifeq ($(strip $(HOST_DEBUG)), yes)
    NVCCOPT += -DGPU_MULTIGRID_DOUBLE
    COPT += -DGPU_MULTIGRID_DOUBLE
  endif
endif

Remark 2: during execution you may see the following error message:

GCR: 1 iterations, <r,r> = nan, |r|/|b| = nan
ERROR: Solver appears to have diverged (rank 0, host gtx-titan.fnal.gov, solver.cpp:160 in PrintStats())

To fix this issue you may need to replace line 241 in inv_gcr_quda.cpp with QUDA_ZERO_FIELD_CREATE.

Remark 3: you may get segfault when exit MG solver due to GCR destructor, to fix this problem add conditioned deallocation in the GCR destructor. i.e.:

   if (init){
      if (param.precision_sloppy != param.precision) {
        delete x_sloppy;
        delete r_sloppy;
      }
      ...
   }

Remark 4 : you may disable the coarse level solver at all ,e.g. , by commenting proper part of the multigrid preconditioner.

  1. What happens?

In solo precision, in the GCR solver the search vector p[0] after the first preconditioned iteration is set to zero. You can observe it by monitoring the norm of p[0] before and after application of the preconditioner in the top level solver, e.g.:

printfQuda("\np[0] value before precond %1.15e : address : %p\n", blas::norm2(*p[0]), p[0]);    
      pushVerbosity(param.verbosity_precondition);
      if ((parity+m)%2 == 0 || param.schwarz_type == QUDA_ADDITIVE_SCHWARZ) (*K)(pPre, rPre);
      else blas::copy(pPre, rPre);
      popVerbosity();
printfQuda("\np[0] value after precond %1.15e (%1.15le) : address : %p\n", blas::norm2(*p[0]), p[0]);
 3.  Very dirty hack

For solo precision GCR, create an intermediate buffer and keep the value of p[0] in the buffer. After each application of the preconditioner, reconstruct p[0] with the intermediate buffer.

alexstrel commented 8 years ago

Forgot to add my environment: gcc version: 4.8.4 nvcc version: 8.0 OS kernel : 2.6.32-642.4.2.el6.x86_64 (Scientific Linux v 6.3) driver version: 367.44

weinbe2 commented 2 months ago

I think we fixed this at some point or another :)