paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
154 stars 109 forks source link

Incorrect results on ROCM 5.7 #464

Open paboyle opened 1 month ago

paboyle commented 1 month ago

Benchmark_dwf_fp32 detects incorrect results under ROCM 5.7

Works find under ROCM 5.3, but recent move by ORNL from 5.3 default to 5.7 default breaks Grid.

I suspect compiler bug and will update Grid to refuse to use non-working ROCM versions.

Signature is weird -- gets wrong answer for the t-direction contributions to 5D Wilson fermion Dw, but only in single precision and only when the gauge link in the time direction is not constant over the whole lattice.

Like it is calculating and address incorrectly. Works in double precision and under 5.3.

lehner commented 2 weeks ago

On Frontier using a Benchmark_wilson_fp32 fails even for 6.1.3. Is there a working version after ROCM 5.3?

lehner commented 2 weeks ago

On 6.2.0 on Frontier it also fails, so far 5.3.0 is the only one on Frontier that I have found for which Grid works.

lehner commented 2 weeks ago

5.6.0 works on Frontier