Closed weinbe2 closed 1 year ago
~...potential false alarm, was pointing my build to the wrong version of QUDA... testing now~
Confirmed issue is indeed in develop
A particularly minimal cmake
command suffices:
cmake ../quda -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON -DQUDA_PRECISION=4 -DQUDA_RECONSTRUCT=4 -DQUDA_GPU_ARCH=sm_80 -DQUDA_FAST_COMPILE_DSLASH=ON -DQUDA_FAST_COMPILE_REDUCE=ON
last good commit: 103c4ff25 first bad commit: 931680a5003b135d4222cb0c1737e2516a9774a6
Unfortunately, this is when the max deviation check was introduced, so tbd where exactly things went awry...
The L2 deviation from the good commit is sane:
Results: CPU = 1343272.597656, QUDA = 1343272.605499, L2 relative deviation = -2.919503e-09
while it has issues in the bad commit:
Results: reference = 1458908.264166, QUDA = 1342092.890616, L2 relative deviation = 4.087040e-02, max deviation = 8.428711e+04
The sources aren't guaranteed to be the same but should have consistent norms, based on those outputs it looks like something went weird with the host verify, tentative phew
If I incrementally add bits of the "bad" commit into the last "good" commit, everything seems fine... I'm a bit confused
Looks like I found it---misplaced curly bracket in the update to the host reference.
Reproducer:
Other tests (
Dslash
,MatPC
) are passing without issue.I'm actively investigating now, but I wanted this down in writing.