Closed maddyscientist closed 5 months ago
This passes a visual review as the code stands. Upon a more detailed review of the generated code, there's some stack spillage at staggered-size Nc --- @maddyscientist 's is going to look into it, and if it's easy to fix will do, if not we'll get this merged in regardless and I'll file an issue to revisit it.
Compile:
cmake -DCMAKE_BUILD_TYPE=RELEASE -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON -DQUDA_GPU_ARCH=sm_80 -DQUDA_DOWNLOAD_USQCD=ON -DQUDA_QIO=ON -DQUDA_QMP=ON -DQUDA_MULTIGRID=ON -DQUDA_MULTIGRID_NVEC_LIST="24,64,96" ../quda
Generate a well behaved 16^4 field:
mpirun -np 1 ./heatbath_test --dim 16 16 16 16 --save-gauge l16t16b7p0 --heatbath-beta 7.0 --heatbath-coldstart true --heatbath-num-steps 10 --heatbath-warmup-steps 1000
Run a 3 <-> 64 <-> 96 test, which flexes recursion:
mpirun -np 1 ./staggered_invert_test \
--prec double --prec-sloppy single --prec-null half --prec-precondition half \
--mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
--dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 \
--dslash-type asqtad --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
--verbosity verbose --solve-type direct --solution-type mat --inv-type gcr \
--inv-multigrid true --mg-levels 4 --mg-coarse-solve-type 0 direct --mg-staggered-coarsen-type kd-optimized \
--mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
--mg-block-size 1 4 4 4 4 --mg-nvec 1 64 \
--mg-block-size 2 2 2 2 2 --mg-nvec 2 96 \
--mg-setup-tol 1 1e-5 --mg-setup-tol 2 1e-5 --mg-setup-inv 1 cgnr --mg-setup-inv 2 cgnr \
--nsrc 1 --niter 25 \
--mg-setup-use-mma 0 true --mg-setup-use-mma 1 true --mg-setup-use-mma 2 true --mg-setup-use-mma 3 true \
--mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct --mg-nu-pre 0 0 --mg-nu-post 0 4 \
--mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct --mg-nu-pre 1 0 --mg-nu-post 1 4 \
--mg-smoother 2 ca-gcr --mg-smoother-solve-type 2 direct-pc --mg-nu-pre 2 0 --mg-nu-post 2 4 \
--mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 0.25 --mg-coarse-solver-maxiter 1 16 \
--mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.25 --mg-coarse-solver-maxiter 2 16 \
--mg-coarse-solver 3 ca-gcr --mg-coarse-solve-type 3 direct-pc --mg-coarse-solver-tol 3 0.25 --mg-coarse-solver-maxiter 3 16 \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose --mg-verbosity 3 verbose
To test the recursion in the multi-rhs coarse operator, append these lines to perform block TRLM with a block size of 48, which should properly split into {32, 16}:
--mg-eig 3 true --mg-eig-type 3 blktrlm --mg-eig-use-dagger 3 false --mg-eig-use-normop 3 true \
--mg-nvec 3 48 --mg-eig-n-ev 3 96 --mg-eig-n-kr 3 192 --mg-eig-tol 3 1e-4 --mg-eig-use-poly-acc 3 false \
--mg-eig-block-size 3 48 --mg-eig-spectrum 3 SR \
--mg-eig-max-restarts 3 1000
This PR adds support for multi-RHS to both the prolongator and restrictor. This will be an essential building block for multi-RHS multigrid solvers:
MG::verify
to use multi-RHSMG
class to move from heap allocation (pointers) to stack allocation (objects) for theColorSpinorField
objects