lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
287 stars 94 forks source link

--recon-precondition/sloppy 8 failing for twisted clover with multigrid #934

Open cpviolator opened 4 years ago

cpviolator commented 4 years ago

While performing tests on SUMMIT with the large L=64,T=128 Twisted Clover lattice, I saw that the initial CG solve to construct null vectors was diverging when --recon-precondition 8 is passed. Here's a list of combinations I've tested, and their behaviour in the initial CG

recon recon-sloppy recon-precondition converged
18 18 18 YES
18 18 12 YES
18 12 12 YES
12 12 12 YES
18 18 8 NO
12 12 8 NO
18 8 18 NO
12 8 12 NO

Is this expected? Here is the rest of the command:

export EXE="./multigrid_invert_test"

export ARGS="--recon 18 --recon-sloppy 12 --prec double --nsrc 16 --dslash-type twisted-clover --compute-clover true                                                              
--niter 30000 --verify true --dim 64 16 16 16 --gridsize 1 4 4 8 --load-gauge ${CONFIG} --kappa 0.1394265 --mu 0.00072 --clover-coeff 0.235630785 \                               
--rank-order row --verbosity verbose --tol 1e-9"

export MG_ARGS_COMMON="--prec-sloppy single --prec-precondition half --prec-null half --recon-precondition 12 \                                                                   
--mg-levels 3 --mg-block-size 0 4 4 4 4 --mg-block-size 1 2 2 2 2 --mg-setup-tol 0 5e-7 --mg-setup-tol 1 5e-7 --mg-setup-inv 0 cg --mg-setup-inv 1 cg \                           
--mg-nvec 0 24 --mg-nvec 1 24 --mg-coarse-solver 1 gcr --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose --pipeline 16 --reliable-delta 1e-5 \           
--ngcrkrylov 30"

export MG_ARGS="--mg-mu-factor 2 0.0 --mg-smoother 0 ca-gcr --mg-smoother 1 ca-gcr --mg-nu-pre 0 0 --mg-nu-post 0 4 --mg-nu-pre 1 2 --mg-nu-post 1 2 \                            
--mg-coarse-solver 2 ca-gcr --mg-coarse-solver-ca-basis-size 2 10 --mg-coarse-solver-maxiter 1 8 --mg-coarse-solver-maxiter 2 8 \                                                 
--mg-coarse-solver-tol 1 0.25 --mg-coarse-solver-tol 2 0.1 \                                                                                                                      
--mg-nvec 2 1024 --mg-eig 2 true --mg-eig-type 2 trlm --mg-eig-nEv 2 1024 --mg-eig-nKr 2 1536 --mg-eig-tol 2 1e-4 --mg-eig-poly-deg 2 50 --mg-eig-amin 2 8e-1 \                   
--mg-eig-amax 2 8.0 --mg-eig-max-restarts 2 25  --mg-eig-use-dagger 2 false  --mg-eig-use-normop 2 true"

export ARGS="${ARGS} ${MG_ARGS_COMMON} ${MG_ARGS}"

command="jsrun -n32 -r1 -g4 -a4 -c4 -l gpu-gpu ${EXE} ${ARGS}"
cpviolator commented 4 years ago

Same story with the current release 1.0 branch.

maddyscientist commented 4 years ago

@cpviolator can you confirm if the arises on small Wilson lattices as well? e.g., the 16^3x64 anisotropic lattices or if it confined to twisted-clover only? You could also run twisted-clover on these smaller lattices to confirm if the issue only arise with multi-GPU or occurs with single GPU.

cpviolator commented 4 years ago

@maddyscientist I used the wilson 16,64 and a small twisted clover 32,64, both with single node and with various MPI splitting, both with recon-sloppy/precondition = 8. None of the tests failed to converge the CG set up of multigrid.

One thing looked fishy though. The computed plaquette for the Wilson was:

Computed plaquette is 7.301271e-02 (spatial = 1.312530e-02, temporal = 1.329001e-01)

which doesn't look right. Is that how you remember it!?

maddyscientist commented 4 years ago

The plaquette isn't computed correctly on anisotropic lattices. On my eternal to do list..... So yes, this is expected.

maddyscientist commented 4 years ago

This is a weird bug then that is being encountered. One thing to try is to up the process count to 64, but run with these smaller volumes. This should be runable on nvsocal with something like

export QUDA_ENABLE_MPS=1 # allow multiple processes per GPU
export QUDA_ENABLE_MANAGED_MEMORY=1 # might not be needed, but will prevent out of memory error
mpirun -np 64 multigrid_invert_test --gridsize 1 4 4 8

which will allow you to run multiple processes per GPU. This might help isolate if the issue only occurs on higher process count.

cpviolator commented 4 years ago

I'll give it a whirl with the 32,64 twisted clover.

cpviolator commented 4 years ago

No joy with the above strategy either.

maddyscientist commented 4 years ago

Ok, I guess we should test this on a different machine. Do you have access to Piz Daint, or should I do this?

cpviolator commented 4 years ago

I don't have access to that anymore, sorry.

cpviolator commented 4 years ago

CMakeCache.txt from the release build on summit CMakeCache.txt

kostrzewa commented 4 years ago

I'm running this test right now as it might be related to the issues that I see in testing for #941

kostrzewa commented 4 years ago

@cpviolator ~note that in your example command above, you don't set a setup solver for the coarsest level. Doesn't that mean that BiCGstab would be used for that? (which won't converge, as we know)~ sorry, I'm being a doofus

kostrzewa commented 4 years ago

sorry, being a doofus in https://github.com/lattice/quda/issues/934#issuecomment-577206276 ...

kostrzewa commented 4 years ago

I cannot confirm your observations on PizDaint. For all test cases below I get similar convergence (lvl0: <r,r>~5e-5, lvl1:<r,r> ~ 3e-4) over the 500 iterations that the CG runs for to get the null vectors. In particular, I don't observe divergence at all.

recon recon-sloppy recon-precondition converged
18 18 18 YES
18 18 12 YES
18 18 8 YES
18 8 18 YES
12 8 12 YES
12 12 8 YES

test command (32 PizDaint nodes)

[...]
export CRAY_CUDA_MPS=1

gdr=0
p2p=0
async=0
mempool=0

machine_id=PizDaint
quda_label=build_test-quda_develop-dynamic_clover-with_tests-with_qio
quda_commit=53e85c521f11d3a94166b951e14bf8640540ec24
gpu_arch=sm_60

export QUDA_RESOURCE_PATH=${HOME}/local/quda_resources/${machine_id}-${quda_label}-${quda_commit}-${gpu_arch}_gdr${gdr}_p2p${p2p}
if [ ! -d ${QUDA_RESOURCE_PATH} ]; then
  mkdir -p ${QUDA_RESOURCE_PATH}
fi

scratch_dir=$SCRATCH/multigrid_invert_test_runs/64c128_32n
mkdir -p ${scratch_dir}/logs
cd ${scratch_dir}

recon=12
recon_sloppy=8
recon_precondition=12

meta="recon${recon}_recon-sloppy${recon_sloppy}_recon-precondition${recon_precondition}"

export ARGS="--recon ${recon} --recon-sloppy ${recon_sloppy} \
--prec double --nsrc 16 --dslash-type twisted-clover --compute-clover true \
--niter 30000 --verify true --dim 64 32 32 16 --gridsize 1 2 2 8 \
--load-gauge ${CONFIG} --kappa 0.1394265 \
--mu 0.00072 --clover-coeff 0.235630785 \
--rank-order row --verbosity verbose --tol 1e-9"

export MG_ARGS_COMMON="--prec-sloppy single --prec-precondition half --prec-null half \
--recon-precondition ${recon_precondition} \
--mg-levels 3 --mg-block-size 0 4 4 4 4 --mg-block-size 1 2 2 2 2 \
--mg-setup-tol 0 5e-7 --mg-setup-tol 1 5e-7 --mg-setup-inv 0 cg --mg-setup-inv 1 cg \
--mg-nvec 0 24 --mg-nvec 1 24 --mg-coarse-solver 1 gcr \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose \
--pipeline 16 --reliable-delta 1e-5 --ngcrkrylov 30"

export MG_ARGS="--mg-mu-factor 2 0.0 --mg-smoother 0 ca-gcr --mg-smoother 1 ca-gcr \
--mg-nu-pre 0 0 --mg-nu-post 0 4 --mg-nu-pre 1 2 --mg-nu-post 1 2 \
--mg-coarse-solver 2 ca-gcr --mg-coarse-solver-ca-basis-size 2 10 \
--mg-coarse-solver-maxiter 1 8 --mg-coarse-solver-maxiter 2 8 \
--mg-coarse-solver-tol 1 0.25 --mg-coarse-solver-tol 2 0.1 \
--mg-nvec 2 1024 --mg-eig 2 true --mg-eig-type 2 trlm --mg-eig-nEv 2 1024 --mg-eig-nKr 2 1536 \
--mg-eig-tol 2 1e-4 --mg-eig-poly-deg 2 50 --mg-eig-amin 2 8e-1 \
--mg-eig-amax 2 8.0 --mg-eig-max-restarts 2 25  --mg-eig-use-dagger 2 false  --mg-eig-use-normop 2 true"

ARGS="${ARGS} ${MG_ARGS_COMMON} ${MG_ARGS}"

logfile=${scratch_dir}/logs/muligrid_invert_test-64c128_32n-${quda_label}-${quda_commit}-${p2p}-gdr${gdr}-async${async}-mempool${mempool}-${SLURM_JOB_ID}_${meta}.out

GOMP_CPU_AFFINITY=0-23:2 \
QUDA_RESOURCE_PATH=${QUDA_RESOURCE_PATH} OMP_NUM_THREADS=12 \
QUDA_ENABLE_GDR=${gdr} QUDA_ENABLE_P2P=${p2p} QUDA_ENABLE_TUNING=1 \
QUDA_ENABLE_DEVICE_MEMORY_POOL=${mempool} MPICH_RDMA_ENABLED_CUDA=1 \
MPICH_NEMESIS_ASYNC_PROGRESS=${async} \
srun ${exe} ${ARGS} 2>&1 | tee ${logfile}

CMake call for test build

_note the QUDA_DYNAMIC_CLOVER_

module load daint-gpu
module swap PrgEnv-cray PrgEnv-gnu
module load CMake
module load Boost
module load cray-libsci
module load cudatoolkit
module load cray-hdf5
module load cray-mpich

CXX=CC \
CC=cc \
cmake \
-DCMAKE_INSTALL_PREFIX="${PROJECT}/libs/2020_01_16/$(basename $(pwd))" \
-DMPI_CXX_COMPILER=CC \
-DMPI_C_COMPILER=cc \
-DQUDA_BUILD_SHAREDLIB=OFF \
-DQUDA_MAX_MULTI_BLAS_N=9 \
-DQUDA_BUILD_ALL_TESTS=ON \
-DQUDA_GPU_ARCH=sm_60 \
-DQUDA_INTERFACE_QDP=ON \
-DQUDA_INTERFACE_MILC=OFF \
-DQUDA_MPI=OFF \
-DQUDA_QMP=ON \
-DQUDA_QIO=ON \
-DQUDA_DOWNLOAD_USQCD=ON \
-DQUDA_DIRAC_WILSON=ON \
-DQUDA_DIRAC_TWISTED_MASS=ON \
-DQUDA_DIRAC_TWISTED_CLOVER=ON \
-DQUDA_DIRAC_NDEG_TWISTED_MASS=ON \
-DQUDA_DIRAC_CLOVER=ON \
-DQUDA_DYNAMIC_CLOVER=ON \
-DQUDA_DIRAC_DOMAIN_WALL=OFF \
-DQUDA_MULTIGRID=ON \
-DQUDA_USE_EIGEN=ON \
-DQUDA_DOWNLOAD_EIGEN=ON \
-DQUDA_BLOCKSOLVER=ON \
-DQUDA_OPENMP=ON \
-DQUDA_TEX=OFF \
-DQUDA_GAUGE_ALG=ON \
-DQUDA_FORCE_GAUGE=ON \
-DQUDA_GAUGE_TOOLS=ON \
-DQUDA_DIRAC_STAGGERED=OFF ${HOME}/code/quda_develop

to compile the tests, I have to run cmake twice, as noted in #957

ps

This was a nice test because I was always worried that in the tmLQCD interface we were losing quite a bit by always working in recon=(18,18,18) mode. Yet, the differences that I observe between the best and worst case (in terms of TTS) are below the level of natural performance fluctuations on PizDaint, although on average it seems that (12,12,8) performs best (at the level of maybe 5%?).

weinbe2 commented 4 years ago

@cpviolator just catching up a little and making absolute sure, you're seeing divergence, not stalling, correct? I see stalling constantly for HISQ, and it's never an issue (in fact, in cases where I try to set the tolerance so it doesn't stall, the near-nulls end up being garbage).

Divergence is of course a different issue.

cpviolator commented 4 years ago

It's divergence. Here's some output.

Computed plaquette is 3.370904e-01 (spatial = 3.392180e-01, temporal = 3.349627e-01) Creating new clover field MG level 0 (GPU): Using curandStateMRG32k3a MG level 0 (GPU): Creating a CG solver MG level 0 (GPU): Running vectors setup on level 0 iter 1 of 1 MG level 0 (GPU): Initial guess = 2.68426e+08 MG level 0 (GPU): Initial rhs = 0 MG level 0 (GPU): CG: 0 iterations, <r,r> = 1.759444e+08, |r|/|b| = 1.000000e+00 MG level 0 (GPU): CG: 1 iterations, <r,r> = 9.390452e+07, |r|/|b| = 7.305594e-01 MG level 0 (GPU): CG: 2 iterations, <r,r> = 4.755315e+08, |r|/|b| = 1.644000e+00 MG level 0 (GPU): CG: 3 iterations, <r,r> = 3.818520e+09, |r|/|b| = 4.658647e+00 MG level 0 (GPU): CG: 4 iterations, <r,r> = 2.177999e+10, |r|/|b| = 1.112605e+01 MG level 0 (GPU): CG: 5 iterations, <r,r> = 1.197597e+11, |r|/|b| = 2.608961e+01 MG level 0 (GPU): CG: 6 iterations, <r,r> = 3.240929e+12, |r|/|b| = 1.357210e+02 MG level 0 (GPU): CG: 7 iterations, <r,r> = 6.545416e+12, |r|/|b| = 1.928772e+02 MG level 0 (GPU): CG: 8 iterations, <r,r> = 2.463026e+16, |r|/|b| = 1.183169e+04 MG level 0 (GPU): CG: 9 iterations, <r,r> = 5.292312e+16, |r|/|b| = 1.734343e+04 MG level 0 (GPU): CG: 10 iterations, <r,r> = 3.360501e+19, |r|/|b| = 4.370331e+05 MG level 0 (GPU): CG: 11 iterations, <r,r> = 4.498235e+20, |r|/|b| = 1.598944e+06 MG level 0 (GPU): CG: 12 iterations, <r,r> = 6.599720e+22, |r|/|b| = 1.936757e+07 MG level 0 (GPU): CG: 13 iterations, <r,r> = 2.513038e+25, |r|/|b| = 3.779304e+08 MG level 0 (GPU): CG: 14 iterations, <r,r> = 5.998391e+23, |r|/|b| = 5.838882e+07 MG level 0 (GPU): CG: 15 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 16 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 17 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 18 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 19 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 20 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 21 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 22 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 23 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 24 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 25 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 26 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 27 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 28 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 29 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 30 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 31 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 32 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 33 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 34 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 35 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 36 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 37 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 38 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 39 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 40 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 41 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 42 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 43 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 44 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 45 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 46 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 47 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 48 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 49 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 50 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 51 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 52 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 53 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 54 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 55 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 56 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 57 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 58 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 59 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 60 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 61 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 62 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 63 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 64 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 65 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 66 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 67 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 68 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 69 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 70 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 71 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 72 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 73 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 74 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 75 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 76 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 77 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 78 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 79 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 80 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 81 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 82 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 83 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 84 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 85 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 86 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 87 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 88 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 89 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 90 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 91 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 92 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 93 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 94 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 95 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 96 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 97 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 98 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 99 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 100 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 101 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 102 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 103 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 104 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 105 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 106 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 107 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 108 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 109 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 110 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 111 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 112 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 113 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 114 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 115 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 116 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 117 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 118 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 119 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 120 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 121 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 122 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 123 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 124 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 125 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 126 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 127 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 128 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 129 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 130 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 131 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 132 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 133 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 134 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 135 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 136 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 137 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 138 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 139 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 140 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 141 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 142 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 143 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 144 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 145 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 146 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 147 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 148 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 149 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 150 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 151 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 152 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 153 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 154 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 155 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 156 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 157 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 158 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 159 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 160 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 161 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 162 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 163 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 164 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 165 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 166 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 167 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 168 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 169 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 170 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 171 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 172 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 173 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 174 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 175 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 176 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 177 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 178 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 179 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 180 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 181 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 182 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 183 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 184 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 185 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 186 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 187 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 188 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 189 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 190 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 191 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 192 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 193 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 194 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 195 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 196 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 197 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 198 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 199 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 200 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 201 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 202 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 203 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 204 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 205 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 206 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 207 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 208 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 209 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 210 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 211 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 212 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 213 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 214 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 215 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 216 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 217 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 218 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 219 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 220 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 221 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 222 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 223 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 224 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 225 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 226 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 227 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 228 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 229 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 230 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 231 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 232 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 233 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 234 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 235 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 236 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 237 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 238 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 239 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 240 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 241 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 242 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 243 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 244 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 245 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 246 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 247 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 248 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 249 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 250 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 251 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 252 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 253 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 254 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 255 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 256 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 257 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 258 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 259 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 260 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 261 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 262 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 263 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 264 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 265 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 266 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 267 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 268 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 269 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 270 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 271 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 272 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 273 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 274 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 275 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 276 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 277 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 278 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 279 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 280 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 281 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 282 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 283 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 284 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 285 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 286 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 287 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 288 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 289 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 290 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 291 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 292 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 293 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 294 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 295 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 296 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 297 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 298 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 299 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 300 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 301 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 302 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 303 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 304 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 305 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 306 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 307 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 308 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 309 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 310 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 311 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 312 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 313 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 314 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 315 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 316 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 317 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 318 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 319 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 320 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 321 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 322 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 323 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 324 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 325 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 326 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 327 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 328 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 329 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 330 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 331 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 332 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 333 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 334 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 335 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 336 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 337 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 338 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 339 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 340 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 341 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 342 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 343 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 344 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 345 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 346 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 347 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 348 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 349 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 350 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 351 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 352 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 353 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 354 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 355 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 356 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 357 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 358 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 359 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 360 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 361 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 362 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 363 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 364 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 365 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 366 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 367 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 368 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 369 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 370 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 371 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 372 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 373 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 374 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 375 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 376 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 377 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 378 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 379 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 380 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 381 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 382 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 383 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 384 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 385 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 386 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 387 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 388 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 389 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 390 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 391 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 392 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 393 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 394 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 395 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 396 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 397 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 398 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 399 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 400 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 401 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 402 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 403 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 404 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 405 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 406 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 407 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 408 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 409 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 410 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 411 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 412 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 413 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 414 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 415 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 416 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 417 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 418 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 419 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 420 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 421 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 422 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 423 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 424 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 425 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 426 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 427 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 428 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 429 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 430 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 431 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 432 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 433 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 434 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 435 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 436 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 437 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 438 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 439 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 440 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 441 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 442 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 443 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 444 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 445 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 446 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 447 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 448 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 449 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 450 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 451 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 452 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 453 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 454 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 455 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 456 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 457 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 458 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 459 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 460 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 461 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 462 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 463 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 464 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 465 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 466 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 467 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 468 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 469 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 470 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 471 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 472 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 473 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 474 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 475 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 476 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 477 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 478 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 479 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 480 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 481 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 482 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 483 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 484 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 485 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 486 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 487 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 488 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 489 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 490 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 491 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 492 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 493 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 494 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 495 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 496 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 497 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 498 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 499 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): CG: 500 iterations, <r,r> = 4.883602e+26, |r|/|b| = 1.666028e+09 MG level 0 (GPU): WARNING: Exceeded maximum iterations 500 MG level 0 (GPU): CG: Reliable updates = 1 MG level 0 (GPU): CG: Convergence at 500 iterations, L2 relative residual: iterated = 1.666028e+09, true = 1.666028e+09 (requested = 5.000000e-07) MG level 0 (GPU): Solution = 4.8505e+26

On Wed, Jan 22, 2020 at 1:28 PM Evan Weinberg notifications@github.com wrote:

@cpviolator https://github.com/cpviolator just catching up a little and making absolute sure, you're seeing divergence, not stalling, correct? I see stalling constantly for HISQ, and it's never an issue (in fact, in cases where I try to set the tolerance so it doesn't stall, the near-nulls end up being garbage).

Divergence is of course a different issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/934?email_source=notifications&email_token=AAR7TV37XNEDT47YKBP363DQ7CFZ5A5CNFSM4JXWFKJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUTSJA#issuecomment-577321252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7TV4ZVENA5LR3V22TG5DQ7CFZ5ANCNFSM4JXWFKJQ .

weinbe2 commented 4 years ago

@cpviolator yep, I agree that's divergence, point noted.

kostrzewa commented 4 years ago

Could this be related to using QUDA_TEX=ON?

weinbe2 commented 4 years ago

Could this be related to using QUDA_TEX=ON?

No with 99.99% confidence

weinbe2 commented 4 years ago

It'd be related to the way the reconstruct is done, not the read itself (which is abstracted at a lower level). The local volume isn't large enough for there to be some sort of weird indexing issue.

maddyscientist commented 4 years ago

I realize that I never reported my progress on this issue. Confirming that I did reproduce the issue on Piz Daint, but only on 128 GPUs. Running on lower node counts did not give the issue. The horrible queue time on Piz Daint was a major pain to really debugging this further (yet).

So this appears to not be a machine issue.

cpviolator commented 4 years ago

Could this be related to an integer overflow?

maddyscientist commented 4 years ago

Unlikely, since larger node counts means smaller local volumes. I can do a run through ASAN on Piz Daint to see if anything is revealed….