lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

Typical staggered `Nc` restrictors spill registers with the CUDA backend #1435

Open weinbe2 opened 5 months ago

weinbe2 commented 5 months ago

Description in title, example for Nc 64 -> 96:

eweinberg$ cuobjdump --dump-resource-usage restrictor_64_96.cu.o

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
host = linux
compile_size = 64bit
compressed

Resource usage:
 Common:
  GLOBAL:19
 Function _ZN4quda13BlockKernel2DINS_10RestrictorENS_14BlockKernelArgILj1ENS_11RestrictArgIffLi2ELi64ELi2ELi96ELb0EEEEELb0EEENSt9enable_ifIXclsr6deviceE14use_kernel_argIT0_EEEvE4typeES7_:
  REG:255 STACK:1280 SHARED:1024 LOCAL:0 CONSTANT[2]:8 CONSTANT[0]:3712 TEXTURE:0 SURFACE:0 SAMPLER:0
 Function _ZN4quda13BlockKernel2DINS_10RestrictorENS_14BlockKernelArgILj1ENS_11RestrictArgIfsLi2ELi64ELi2ELi96ELb0EEEEELb0EEENSt9enable_ifIXclsr6deviceE14use_kernel_argIT0_EEEvE4typeES7_:
  REG:255 STACK:1280 SHARED:1024 LOCAL:0 CONSTANT[2]:8 CONSTANT[0]:3728 TEXTURE:0 SURFACE:0 SAMPLER:0

Reference command to compile:

cmake -DCMAKE_BUILD_TYPE=RELEASE -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON   -DQUDA_GPU_ARCH=sm_80 -DQUDA_DOWNLOAD_USQCD=ON -DQUDA_QIO=ON -DQUDA_QMP=ON   -DQUDA_MULTIGRID=ON -DQUDA_MULTIGRID_NVEC_LIST="24,64,96" ../quda

For a quick copy+paste command to generate a well-behaved configuration and then do an MG solve that has 3 <-> 64 <-> 96 can be found here: https://github.com/lattice/quda/wiki/Staggered-Multigrid-Solver#quick-context-free-example-solve-command