Closed kspaff closed 10 years ago
With the new LLVM compiler backend, CUDA FFT performance dropped by 50% on Keeneland. OpenCL performance stayed the same.
I suspect this might be due to loops being unrolled differently (the unroll option that used to go to the old compiler is now ignored).
Fixed some time ago, FFT can now be built to use cufft
With the new LLVM compiler backend, CUDA FFT performance dropped by 50% on Keeneland. OpenCL performance stayed the same.
I suspect this might be due to loops being unrolled differently (the unroll option that used to go to the old compiler is now ignored).