Closed hjabird closed 5 years ago
Possibly an issue with using the flag -cl-fast-relaxed-math for building OpenCL kernels. This sets -cl-finite-math-only and -cl-unsafe-math-optimizations. NaN/Inf violate the finite maths assumptions, but in practice is not an issue on AMD/Intel GPUs.
On AMD GCN5 / Windows (Vega 56) -cl-unsafe-math-optimizations does not hinder performance compared to -cl-fast-relaxed-math.
Fixed in commit 5b204179d4639d0c996edf6919f20ffa720a8268.
As per title.
Reproduce as: ppos = rand(1000, 3) pvort = rand(1000, 3) particle_induced_velocity(ppos, pvort, ppos, winckelmans_regularisation(), 0.1)
Code works fine with AMD & Intel GPUs/iGPUs.