lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
286 stars 94 forks source link

hisq code computes wrong results when "-g" instead of "-O3" used #70

Closed gshi closed 12 years ago

gshi commented 12 years ago

"-g" or "-O3" is only for the NVCCFLAGS. The options in CFLAGS/CXXFLAGS does not affect the compute results.

Specifically the problem comes from the staple computing function and can be reproduced easily by returning after the first middle kernel call, both in hisq cuda code and the hisq reference code. By switching from "-O3" to "-g" in NVCCFLAGS, the cuda computed mom results changes while the reference results stay the same, indicating something is broken in hisq cuda side.

The testing is done using the following command ./hisq_paths_force_test --sdim 8 --tdim 16 --recon 18 --prec double --verify --tune false in M2090/C2050, cuda 4.2, ac cluster @ncsa

update:

Running single precision got the same error. Multi-gpu build

maddyscientist commented 12 years ago

The flag "-g" only affect host code and not device code, so this suggests that "-g" is breaking in host code that is defined in a .cu file.

maddyscientist commented 12 years ago

Closer investigation ha revealed that "-g" is not triggering the error, rather it is the lack of "-O3". Also, this problem has been isolated to lib/dslash_quda.cu.

maddyscientist commented 12 years ago

The bug is triggered by "-O0" but not by "-O1", "-O2" or "-O3". This is a host compiler flag, so this would point towards it being a gcc bug. I have repro-ed it on gcc-4.6. Investigating......

maddyscientist commented 12 years ago

I have tracked the error to occurring in hisqStaplesForceCuda.

The allow for easier debugging of this I have introduced a norm2 function for a gauge field. This actually "casts" a gauge field into a spinor field, upon which the L2 norm is computed.