All R2C test that have ndim=2 fails [numpy], because the fast axis is not always the last one

TsXor commented 1 year ago

pyopencl  R2C         (30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.09e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 0 iFFT: n2=1.06e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 0   OK
pyopencl  R2C         (30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.09e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 0 iFFT: n2=9.92e-08 ninf=1.36e-07 < 2.74e-06 (0.049) 0   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.09e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 1 iFFT: n2=1.06e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 1   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.09e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 1 iFFT: n2=9.92e-08 ninf=1.36e-07 < 2.74e-06 (0.049) 1   OK
pyopencl  R2C         (30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=8.12e-08 ninf=8.93e-08 < 2.74e-06 (0.033) 0 iFFT: n2=7.62e-08 ninf=7.70e-08 < 2.74e-06 (0.028) 0   OK
pyopencl  R2C         (30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=8.12e-08 ninf=8.93e-08 < 2.74e-06 (0.033) 0 iFFT: n2=1.31e-07 ninf=1.35e-07 < 2.74e-06 (0.049) 0   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=8.12e-08 ninf=8.93e-08 < 2.74e-06 (0.033) 1 iFFT: n2=7.62e-08 ninf=7.70e-08 < 2.74e-06 (0.028) 1   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=8.12e-08 ninf=8.93e-08 < 2.74e-06 (0.033) 1 iFFT: n2=1.31e-07 ninf=1.35e-07 < 2.74e-06 (0.049) 1   OK
pyopencl  R2C         (30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=2.93e-16 ninf=3.16e-16 < 5.74e-15 (0.055) 0 iFFT: n2=3.70e-16 ninf=4.57e-16 < 5.74e-15 (0.080) 0   OK
pyopencl  R2C         (30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=2.93e-16 ninf=3.16e-16 < 5.74e-15 (0.055) 0 iFFT: n2=2.13e-16 ninf=3.43e-16 < 5.74e-15 (0.060) 0   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=2.62e-16 ninf=3.16e-16 < 5.74e-15 (0.055) 1 iFFT: n2=2.61e-16 ninf=3.43e-16 < 5.74e-15 (0.060) 1   OK
pyopencl  R2C          (30,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=2.62e-16 ninf=3.16e-16 < 5.74e-15 (0.055) 1 iFFT: n2=2.63e-16 ninf=3.43e-16 < 5.74e-15 (0.060) 1   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.37e-07 ninf=1.27e-07 < 2.74e-06 (0.047) 0 iFFT: n2=1.38e-07 ninf=2.24e-07 < 2.74e-06 (0.082) 0   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.37e-07 ninf=1.27e-07 < 2.74e-06 (0.047) 0 iFFT: n2=1.22e-07 ninf=1.96e-07 < 2.74e-06 (0.071) 0   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.37e-07 ninf=1.27e-07 < 2.74e-06 (0.047) 1 iFFT: n2=1.38e-07 ninf=2.24e-07 < 2.74e-06 (0.082) 1   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.37e-07 ninf=1.27e-07 < 2.74e-06 (0.047) 1 iFFT: n2=1.22e-07 ninf=1.96e-07 < 2.74e-06 (0.071) 1   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=8.20e-08 ninf=6.13e-08 < 2.74e-06 (0.022) 0 iFFT: n2=1.01e-07 ninf=1.30e-07 < 2.74e-06 (0.047) 0   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=8.20e-08 ninf=6.13e-08 < 2.74e-06 (0.022) 0 iFFT: n2=1.32e-07 ninf=2.33e-07 < 2.74e-06 (0.085) 0   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=8.20e-08 ninf=6.13e-08 < 2.74e-06 (0.022) 1 iFFT: n2=1.01e-07 ninf=1.30e-07 < 2.74e-06 (0.047) 1   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=8.20e-08 ninf=6.13e-08 < 2.74e-06 (0.022) 1 iFFT: n2=1.32e-07 ninf=2.33e-07 < 2.74e-06 (0.085) 1   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=2.25e-16 ninf=3.30e-16 < 5.74e-15 (0.057) 0 iFFT: n2=5.44e-16 ninf=7.78e-16 < 5.74e-15 (0.136) 0   OK
pyopencl  R2C       (2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=2.25e-16 ninf=3.30e-16 < 5.74e-15 (0.057) 0 iFFT: n2=2.97e-16 ninf=5.56e-16 < 5.74e-15 (0.097) 0   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=2.30e-16 ninf=3.76e-16 < 5.74e-15 (0.066) 1 iFFT: n2=2.62e-16 ninf=4.45e-16 < 5.74e-15 (0.077) 1   OK
pyopencl  R2C         (2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=2.30e-16 ninf=3.76e-16 < 5.74e-15 (0.066) 1 iFFT: n2=2.76e-16 ninf=3.47e-16 < 5.74e-15 (0.061) 1   OK
pyopencl  R2C        (30,30) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   0   FFT: n2=1.79e-07 ninf=2.24e-07 < 3.48e-06 (0.064) 1 iFFT: n2=1.43e+00 ninf=2.52e+00 < 3.48e-06 (723527.389) 0 FAIL
pyopencl  R2C        (30,30) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   1   FFT: n2=1.79e-07 ninf=2.24e-07 < 3.48e-06 (0.064) 1 iFFT: n2=1.43e+00 ninf=2.52e+00 < 3.48e-06 (723527.355) 0 FAIL
pyopencl  R2C        (30,30) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   0   FFT: n2=1.43e-07 ninf=1.81e-07 < 3.48e-06 (0.052) 1 iFFT: n2=1.43e+00 ninf=2.52e+00 < 3.48e-06 (723527.423) 0 FAIL
pyopencl  R2C        (30,30) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   1   FFT: n2=1.43e-07 ninf=1.81e-07 < 3.48e-06 (0.052) 1 iFFT: n2=1.43e+00 ninf=2.52e+00 < 3.48e-06 (723527.423) 0 FAIL
pyopencl  R2C        (30,30) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   0   FFT: n2=3.85e-16 ninf=4.70e-16 < 6.48e-15 (0.073) 1 iFFT: n2=1.43e+00 ninf=2.38e+00 < 6.48e-15 (367908337750424.188) 0 FAIL
pyopencl  R2C        (30,30) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   1   FFT: n2=3.85e-16 ninf=4.70e-16 < 6.48e-15 (0.073) 1 iFFT: n2=1.43e+00 ninf=2.38e+00 < 6.48e-15 (367908337750424.188) 0 FAIL
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.45e-07 ninf=2.15e-07 < 2.74e-06 (0.078) 0 iFFT: n2=1.31e-07 ninf=2.21e-07 < 2.74e-06 (0.081) 0   OK
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.45e-07 ninf=2.15e-07 < 2.74e-06 (0.078) 0 iFFT: n2=1.43e-07 ninf=2.25e-07 < 2.74e-06 (0.082) 0   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.45e-07 ninf=2.15e-07 < 2.74e-06 (0.078) 1 iFFT: n2=1.31e-07 ninf=2.21e-07 < 2.74e-06 (0.081) 1   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.45e-07 ninf=2.15e-07 < 2.74e-06 (0.078) 1 iFFT: n2=1.43e-07 ninf=2.25e-07 < 2.74e-06 (0.082) 1   OK
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=1.01e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 0 iFFT: n2=9.53e-08 ninf=1.23e-07 < 2.74e-06 (0.045) 0   OK
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=1.01e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 0 iFFT: n2=1.37e-07 ninf=2.25e-07 < 2.74e-06 (0.082) 0   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=1.01e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 1 iFFT: n2=9.53e-08 ninf=1.23e-07 < 2.74e-06 (0.045) 1   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=1.01e-07 ninf=1.28e-07 < 2.74e-06 (0.047) 1 iFFT: n2=1.37e-07 ninf=2.25e-07 < 2.74e-06 (0.082) 1   OK
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=2.88e-16 ninf=3.14e-16 < 5.74e-15 (0.055) 0 iFFT: n2=4.60e-16 ninf=8.94e-16 < 5.74e-15 (0.156) 0   OK
pyopencl  R2C     (2,2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=2.88e-16 ninf=3.14e-16 < 5.74e-15 (0.055) 0 iFFT: n2=2.65e-16 ninf=5.59e-16 < 5.74e-15 (0.097) 0   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=2.82e-16 ninf=3.14e-16 < 5.74e-15 (0.055) 1 iFFT: n2=2.87e-16 ninf=4.75e-16 < 5.74e-15 (0.083) 1   OK
pyopencl  R2C       (2,2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=2.82e-16 ninf=3.14e-16 < 5.74e-15 (0.055) 1 iFFT: n2=2.72e-16 ninf=3.91e-16 < 5.74e-15 (0.068) 1   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.53e-07 ninf=2.01e-07 < 2.74e-06 (0.073) 0 iFFT: n2=1.43e-07 ninf=3.13e-07 < 2.74e-06 (0.114) 0   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.53e-07 ninf=2.01e-07 < 2.74e-06 (0.073) 0 iFFT: n2=1.48e-07 ninf=2.80e-07 < 2.74e-06 (0.102) 0   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.53e-07 ninf=2.01e-07 < 2.74e-06 (0.073) 1 iFFT: n2=1.43e-07 ninf=3.13e-07 < 2.74e-06 (0.114) 1   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.53e-07 ninf=2.01e-07 < 2.74e-06 (0.073) 1 iFFT: n2=1.48e-07 ninf=2.80e-07 < 2.74e-06 (0.102) 1   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=1.01e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 0 iFFT: n2=1.08e-07 ninf=2.20e-07 < 2.74e-06 (0.080) 0   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=1.01e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 0 iFFT: n2=1.37e-07 ninf=1.94e-07 < 2.74e-06 (0.071) 0   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=1.01e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 1 iFFT: n2=1.08e-07 ninf=2.20e-07 < 2.74e-06 (0.080) 1   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=1.01e-07 ninf=1.18e-07 < 2.74e-06 (0.043) 1 iFFT: n2=1.37e-07 ninf=1.94e-07 < 2.74e-06 (0.071) 1   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=2.89e-16 ninf=3.35e-16 < 5.74e-15 (0.058) 0 iFFT: n2=4.58e-16 ninf=7.84e-16 < 5.74e-15 (0.137) 0   OK
pyopencl  R2C   (2,2,2,30+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=2.89e-16 ninf=3.35e-16 < 5.74e-15 (0.058) 0 iFFT: n2=3.03e-16 ninf=5.60e-16 < 5.74e-15 (0.098) 0   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=2.92e-16 ninf=3.35e-16 < 5.74e-15 (0.058) 1 iFFT: n2=3.06e-16 ninf=6.72e-16 < 5.74e-15 (0.117) 1   OK
pyopencl  R2C     (2,2,2,30) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=2.92e-16 ninf=3.35e-16 < 5.74e-15 (0.058) 1 iFFT: n2=3.27e-16 ninf=5.60e-16 < 5.74e-15 (0.098) 1   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.92e-07 ninf=1.58e-07 < 2.77e-06 (0.057) 0 iFFT: n2=1.88e-07 ninf=2.73e-07 < 2.77e-06 (0.099) 0   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.92e-07 ninf=1.58e-07 < 2.77e-06 (0.057) 0 iFFT: n2=1.72e-07 ninf=2.00e-07 < 2.77e-06 (0.072) 0   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.92e-07 ninf=1.58e-07 < 5.53e-06 (0.028) 1 iFFT: n2=1.88e-07 ninf=2.73e-07 < 5.53e-06 (0.049) 1   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.92e-07 ninf=1.58e-07 < 5.53e-06 (0.028) 1 iFFT: n2=1.72e-07 ninf=2.00e-07 < 5.53e-06 (0.036) 1   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=1.86e-07 ninf=1.78e-07 < 2.77e-06 (0.064) 0 iFFT: n2=1.64e-07 ninf=2.63e-07 < 2.77e-06 (0.095) 0   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=1.86e-07 ninf=1.78e-07 < 2.77e-06 (0.064) 0 iFFT: n2=1.71e-07 ninf=2.59e-07 < 2.77e-06 (0.094) 0   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=1.86e-07 ninf=1.78e-07 < 5.53e-06 (0.032) 1 iFFT: n2=1.64e-07 ninf=2.63e-07 < 5.53e-06 (0.047) 1   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=1.86e-07 ninf=1.78e-07 < 5.53e-06 (0.032) 1 iFFT: n2=1.71e-07 ninf=2.59e-07 < 5.53e-06 (0.047) 1   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=6.19e-16 ninf=7.44e-16 < 5.77e-15 (0.129) 0 iFFT: n2=3.26e-16 ninf=3.41e-16 < 5.77e-15 (0.059) 0   OK
pyopencl  R2C         (34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=6.19e-16 ninf=7.44e-16 < 5.77e-15 (0.129) 0 iFFT: n2=3.07e-16 ninf=4.55e-16 < 5.77e-15 (0.079) 0   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=6.19e-16 ninf=7.44e-16 < 1.15e-14 (0.064) 1 iFFT: n2=3.26e-16 ninf=3.41e-16 < 1.15e-14 (0.030) 1   OK
pyopencl  R2C          (34,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=6.19e-16 ninf=7.44e-16 < 1.15e-14 (0.064) 1 iFFT: n2=3.07e-16 ninf=4.55e-16 < 1.15e-14 (0.039) 1   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=1.66e-07 ninf=1.91e-07 < 2.77e-06 (0.069) 0 iFFT: n2=2.20e-07 ninf=2.82e-07 < 2.77e-06 (0.102) 0   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=1.66e-07 ninf=1.91e-07 < 2.77e-06 (0.069) 0 iFFT: n2=2.38e-07 ninf=3.21e-07 < 2.77e-06 (0.116) 0   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=1.66e-07 ninf=1.91e-07 < 5.53e-06 (0.035) 1 iFFT: n2=2.20e-07 ninf=2.82e-07 < 5.53e-06 (0.051) 1   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=1.66e-07 ninf=1.91e-07 < 5.53e-06 (0.035) 1 iFFT: n2=2.38e-07 ninf=3.21e-07 < 5.53e-06 (0.058) 1   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=1.94e-07 ninf=2.01e-07 < 2.77e-06 (0.073) 0 iFFT: n2=2.05e-07 ninf=2.96e-07 < 2.77e-06 (0.107) 0   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=1.94e-07 ninf=2.01e-07 < 2.77e-06 (0.073) 0 iFFT: n2=2.16e-07 ninf=3.21e-07 < 2.77e-06 (0.116) 0   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=1.94e-07 ninf=2.01e-07 < 5.53e-06 (0.036) 1 iFFT: n2=2.05e-07 ninf=2.96e-07 < 5.53e-06 (0.054) 1   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=1.94e-07 ninf=2.01e-07 < 5.53e-06 (0.036) 1 iFFT: n2=2.16e-07 ninf=3.21e-07 < 5.53e-06 (0.058) 1   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=4.90e-16 ninf=5.49e-16 < 5.77e-15 (0.095) 0 iFFT: n2=5.86e-16 ninf=8.89e-16 < 5.77e-15 (0.154) 0   OK
pyopencl  R2C       (2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=4.90e-16 ninf=5.49e-16 < 5.77e-15 (0.095) 0 iFFT: n2=5.30e-16 ninf=8.89e-16 < 5.77e-15 (0.154) 0   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=4.90e-16 ninf=5.49e-16 < 1.15e-14 (0.048) 1 iFFT: n2=5.86e-16 ninf=8.89e-16 < 1.15e-14 (0.077) 1   OK
pyopencl  R2C         (2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=4.90e-16 ninf=5.49e-16 < 1.15e-14 (0.048) 1 iFFT: n2=5.30e-16 ninf=8.89e-16 < 1.15e-14 (0.077) 1   OK
pyopencl  R2C        (34,34) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   0   FFT: n2=3.16e-07 ninf=3.05e-07 < 7.06e-06 (0.043) 1 iFFT: n2=1.39e+00 ninf=2.70e+00 < 7.06e-06 (382709.503) 0 FAIL
pyopencl  R2C        (34,34) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   1   FFT: n2=3.16e-07 ninf=3.05e-07 < 7.06e-06 (0.043) 1 iFFT: n2=1.39e+00 ninf=2.70e+00 < 7.06e-06 (382709.486) 0 FAIL
pyopencl  R2C        (34,34) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   0   FFT: n2=3.04e-07 ninf=3.78e-07 < 7.06e-06 (0.053) 1 iFFT: n2=1.39e+00 ninf=2.70e+00 < 7.06e-06 (382709.503) 0 FAIL
pyopencl  R2C        (34,34) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   1   FFT: n2=3.04e-07 ninf=3.78e-07 < 7.06e-06 (0.053) 1 iFFT: n2=1.39e+00 ninf=2.70e+00 < 7.06e-06 (382709.469) 0 FAIL
pyopencl  R2C        (34,34) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   0   FFT: n2=7.93e-16 ninf=1.04e-15 < 1.31e-14 (0.080) 1 iFFT: n2=1.43e+00 ninf=2.78e+00 < 1.31e-14 (212602327789630.625) 0 FAIL
pyopencl  R2C        (34,34) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   1   FFT: n2=7.93e-16 ninf=1.04e-15 < 1.31e-14 (0.080) 1 iFFT: n2=1.43e+00 ninf=2.78e+00 < 1.31e-14 (212602327789630.625) 0 FAIL
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=2.07e-07 ninf=2.35e-07 < 2.77e-06 (0.085) 0 iFFT: n2=2.13e-07 ninf=2.84e-07 < 2.77e-06 (0.103) 0   OK
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=2.07e-07 ninf=2.35e-07 < 2.77e-06 (0.085) 0 iFFT: n2=2.30e-07 ninf=3.37e-07 < 2.77e-06 (0.122) 0   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=2.07e-07 ninf=2.35e-07 < 5.53e-06 (0.042) 1 iFFT: n2=2.13e-07 ninf=2.84e-07 < 5.53e-06 (0.051) 1   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=2.07e-07 ninf=2.35e-07 < 5.53e-06 (0.042) 1 iFFT: n2=2.30e-07 ninf=3.37e-07 < 5.53e-06 (0.061) 1   OK
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=2.07e-07 ninf=2.72e-07 < 2.77e-06 (0.098) 0 iFFT: n2=2.05e-07 ninf=3.36e-07 < 2.77e-06 (0.121) 0   OK
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=2.07e-07 ninf=2.72e-07 < 2.77e-06 (0.098) 0 iFFT: n2=2.00e-07 ninf=3.03e-07 < 2.77e-06 (0.110) 0   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=2.07e-07 ninf=2.72e-07 < 5.53e-06 (0.049) 1 iFFT: n2=2.05e-07 ninf=3.36e-07 < 5.53e-06 (0.061) 1   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=2.07e-07 ninf=2.72e-07 < 5.53e-06 (0.049) 1 iFFT: n2=2.00e-07 ninf=3.03e-07 < 5.53e-06 (0.055) 1   OK
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=5.30e-16 ninf=7.55e-16 < 5.77e-15 (0.131) 0 iFFT: n2=5.41e-16 ninf=1.23e-15 < 5.77e-15 (0.213) 0   OK
pyopencl  R2C     (2,2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=5.30e-16 ninf=7.55e-16 < 5.77e-15 (0.131) 0 iFFT: n2=5.55e-16 ninf=1.00e-15 < 5.77e-15 (0.174) 0   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=5.30e-16 ninf=7.55e-16 < 1.15e-14 (0.065) 1 iFFT: n2=5.41e-16 ninf=1.23e-15 < 1.15e-14 (0.106) 1   OK
pyopencl  R2C       (2,2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=5.30e-16 ninf=7.55e-16 < 1.15e-14 (0.065) 1 iFFT: n2=5.55e-16 ninf=1.00e-15 < 1.15e-14 (0.087) 1   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=2.13e-07 ninf=2.43e-07 < 2.77e-06 (0.088) 0 iFFT: n2=1.96e-07 ninf=3.26e-07 < 2.77e-06 (0.118) 0   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=2.13e-07 ninf=2.43e-07 < 2.77e-06 (0.088) 0 iFFT: n2=2.07e-07 ninf=3.64e-07 < 2.77e-06 (0.132) 0   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=2.13e-07 ninf=2.43e-07 < 5.53e-06 (0.044) 1 iFFT: n2=1.96e-07 ninf=3.26e-07 < 5.53e-06 (0.059) 1   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=2.13e-07 ninf=2.43e-07 < 5.53e-06 (0.044) 1 iFFT: n2=2.07e-07 ninf=3.64e-07 < 5.53e-06 (0.066) 1   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=2.07e-07 ninf=2.33e-07 < 2.77e-06 (0.084) 0 iFFT: n2=1.75e-07 ninf=3.26e-07 < 2.77e-06 (0.118) 0   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=2.07e-07 ninf=2.33e-07 < 2.77e-06 (0.084) 0 iFFT: n2=1.95e-07 ninf=3.33e-07 < 2.77e-06 (0.120) 0   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=2.07e-07 ninf=2.33e-07 < 5.53e-06 (0.042) 1 iFFT: n2=1.75e-07 ninf=3.26e-07 < 5.53e-06 (0.059) 1   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=2.07e-07 ninf=2.33e-07 < 5.53e-06 (0.042) 1 iFFT: n2=1.95e-07 ninf=3.33e-07 < 5.53e-06 (0.060) 1   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=5.78e-16 ninf=1.12e-15 < 5.77e-15 (0.194) 0 iFFT: n2=5.78e-16 ninf=9.91e-16 < 5.77e-15 (0.172) 0   OK
pyopencl  R2C   (2,2,2,34+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=5.78e-16 ninf=1.12e-15 < 5.77e-15 (0.194) 0 iFFT: n2=5.63e-16 ninf=1.07e-15 < 5.77e-15 (0.186) 0   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=5.78e-16 ninf=1.12e-15 < 1.15e-14 (0.097) 1 iFFT: n2=5.78e-16 ninf=9.91e-16 < 1.15e-14 (0.086) 1   OK
pyopencl  R2C     (2,2,2,34) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=5.78e-16 ninf=1.12e-15 < 1.15e-14 (0.097) 1 iFFT: n2=5.63e-16 ninf=1.07e-15 < 1.15e-14 (0.093) 1   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=3.52e-07 ninf=3.76e-07 < 3.45e-06 (0.109) 0 iFFT: n2=3.05e-07 ninf=4.77e-07 < 3.45e-06 (0.138) 0   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=3.52e-07 ninf=3.76e-07 < 3.45e-06 (0.109) 0 iFFT: n2=2.99e-07 ninf=5.14e-07 < 3.45e-06 (0.149) 0   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=3.52e-07 ninf=3.76e-07 < 6.91e-06 (0.054) 1 iFFT: n2=3.05e-07 ninf=4.77e-07 < 6.91e-06 (0.069) 1   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=3.52e-07 ninf=3.76e-07 < 6.91e-06 (0.054) 1 iFFT: n2=2.99e-07 ninf=5.14e-07 < 6.91e-06 (0.074) 1   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=2.46e-07 ninf=2.57e-07 < 3.45e-06 (0.074) 0 iFFT: n2=2.01e-07 ninf=4.55e-07 < 3.45e-06 (0.132) 0   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=2.46e-07 ninf=2.57e-07 < 3.45e-06 (0.074) 0 iFFT: n2=1.93e-07 ninf=4.09e-07 < 3.45e-06 (0.119) 0   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=2.46e-07 ninf=2.57e-07 < 6.91e-06 (0.037) 1 iFFT: n2=2.01e-07 ninf=4.55e-07 < 6.91e-06 (0.066) 1   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=2.46e-07 ninf=2.57e-07 < 6.91e-06 (0.037) 1 iFFT: n2=1.93e-07 ninf=4.09e-07 < 6.91e-06 (0.059) 1   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=7.58e-16 ninf=1.00e-15 < 6.45e-15 (0.155) 0 iFFT: n2=4.26e-16 ninf=8.89e-16 < 6.45e-15 (0.138) 0   OK
pyopencl  R2C        (808+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=7.58e-16 ninf=1.00e-15 < 6.45e-15 (0.155) 0 iFFT: n2=5.06e-16 ninf=1.11e-15 < 6.45e-15 (0.172) 0   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=7.58e-16 ninf=1.00e-15 < 1.29e-14 (0.078) 1 iFFT: n2=4.26e-16 ninf=8.89e-16 < 1.29e-14 (0.069) 1   OK
pyopencl  R2C         (808,) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=7.58e-16 ninf=1.00e-15 < 1.29e-14 (0.078) 1 iFFT: n2=4.44e-16 ninf=1.00e-15 < 1.29e-14 (0.077) 1   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   0   FFT: n2=3.73e-07 ninf=4.34e-07 < 3.45e-06 (0.126) 0 iFFT: n2=3.60e-07 ninf=7.22e-07 < 3.45e-06 (0.209) 0   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float32 lut=None inplace=1  norm=   1   FFT: n2=3.73e-07 ninf=4.34e-07 < 3.45e-06 (0.126) 0 iFFT: n2=3.50e-07 ninf=6.94e-07 < 3.45e-06 (0.201) 0   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   0   FFT: n2=3.73e-07 ninf=4.34e-07 < 6.91e-06 (0.063) 1 iFFT: n2=3.60e-07 ninf=7.22e-07 < 6.91e-06 (0.104) 1   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float32 lut=None inplace=0  norm=   1   FFT: n2=3.73e-07 ninf=4.34e-07 < 6.91e-06 (0.063) 1 iFFT: n2=3.50e-07 ninf=6.94e-07 < 6.91e-06 (0.100) 1   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   0   FFT: n2=2.60e-07 ninf=2.70e-07 < 3.45e-06 (0.078) 0 iFFT: n2=2.64e-07 ninf=4.77e-07 < 3.45e-06 (0.138) 0   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float32 lut=True inplace=1  norm=   1   FFT: n2=2.60e-07 ninf=2.70e-07 < 3.45e-06 (0.078) 0 iFFT: n2=2.53e-07 ninf=4.86e-07 < 3.45e-06 (0.141) 0   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   0   FFT: n2=2.60e-07 ninf=2.70e-07 < 6.91e-06 (0.039) 1 iFFT: n2=2.64e-07 ninf=4.77e-07 < 6.91e-06 (0.069) 1   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float32 lut=True inplace=0  norm=   1   FFT: n2=2.60e-07 ninf=2.70e-07 < 6.91e-06 (0.039) 1 iFFT: n2=2.53e-07 ninf=4.86e-07 < 6.91e-06 (0.070) 1   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   0   FFT: n2=7.55e-16 ninf=9.57e-16 < 6.45e-15 (0.148) 0 iFFT: n2=4.53e-16 ninf=9.99e-16 < 6.45e-15 (0.155) 0   OK
pyopencl  R2C      (2,808+2) axes=      None ndim=   1    float64 lut=None inplace=1  norm=   1   FFT: n2=7.55e-16 ninf=9.57e-16 < 6.45e-15 (0.148) 0 iFFT: n2=5.30e-16 ninf=9.99e-16 < 6.45e-15 (0.155) 0   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   0   FFT: n2=7.55e-16 ninf=9.57e-16 < 1.29e-14 (0.074) 1 iFFT: n2=4.53e-16 ninf=9.99e-16 < 1.29e-14 (0.077) 1   OK
pyopencl  R2C        (2,808) axes=      None ndim=   1    float64 lut=None inplace=0  norm=   1   FFT: n2=7.55e-16 ninf=9.57e-16 < 1.29e-14 (0.074) 1 iFFT: n2=4.71e-16 ninf=8.88e-16 < 1.29e-14 (0.069) 1   OK
pyopencl  R2C      (808,808) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   0   FFT: n2=5.83e-07 ninf=6.50e-07 < 9.81e-06 (0.066) 1 iFFT: n2=1.41e+00 ninf=3.33e+00 < 9.81e-06 (339051.468) 0 FAIL
pyopencl  R2C      (808,808) axes=      None ndim=   2    float32 lut=None inplace=0  norm=   1   FFT: n2=5.83e-07 ninf=6.50e-07 < 9.81e-06 (0.066) 1 iFFT: n2=1.41e+00 ninf=3.33e+00 < 9.81e-06 (339051.443) 0 FAIL
pyopencl  R2C      (808,808) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   0   FFT: n2=3.71e-07 ninf=3.84e-07 < 9.81e-06 (0.039) 1 iFFT: n2=1.41e+00 ninf=3.33e+00 < 9.81e-06 (339051.516) 0 FAIL
pyopencl  R2C      (808,808) axes=      None ndim=   2    float32 lut=True inplace=0  norm=   1   FFT: n2=3.71e-07 ninf=3.84e-07 < 9.81e-06 (0.039) 1 iFFT: n2=1.41e+00 ninf=3.33e+00 < 9.81e-06 (339051.492) 0 FAIL
pyopencl  R2C      (808,808) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   0   FFT: n2=1.16e-15 ninf=1.62e-15 < 1.58e-14 (0.103) 1 iFFT: n2=1.41e+00 ninf=3.14e+00 < 1.58e-14 (198609444611850.719) 0 FAIL
pyopencl  R2C      (808,808) axes=      None ndim=   2    float64 lut=None inplace=0  norm=   1   FFT: n2=1.16e-15 ninf=1.62e-15 < 1.58e-14 (0.103) 1 iFFT: n2=1.41e+00 ninf=3.14e+00 < 1.58e-14 (198609444611850.719) 0 FAIL

My GPU supports cl_khr_fp64, but does not support cl_khr_int64, so I made a small modification in vkFFT.h:

    if ((!strcmp(floatType, "double")) || (sc->useUint64)) {
        sc->tempLen = sprintf(sc->tempStr, "\
#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n\
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable\n\
#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable\n\n");
        res = VkAppendLine(sc);
        if (res != VKFFT_SUCCESS) return res;
    }

vincefn commented 1 year ago

Could you specify exactly what version of pyvkfft and VkFFT (git hash) you are using? Also what is the exact GPU reference?

This is likely a VkFFT issue, so should be reported against that project, so @dtolm can have a look.

TsXor commented 1 year ago

Could you specify exactly what version of pyvkfft and VkFFT (git hash) you are using? Also what is the exact GPU reference?

This is likely a VkFFT issue, so should be reported against that project, so @DTolm can have a look.

pyvkfft version: 2022.1.1 Tested my modified version on GTX 1050 Ti and R7 240, GTX 1050 Ti works fine, but R7 240 fails all R2C test with ndim=2. Wheel is built on GTX 1050 Ti only.

TsXor commented 1 year ago

I found that in the failed R2C tests, for some unknown reason, the shape changed:

======================================================================
ERROR: test_r2c (pyvkfft.test.test_fft.TestFFT) (backend='pyopencl', n=30, dims=2, ndim=2, axes=None, dtype=dtype('float32'), norm=0, use_lut=None, inplace=True, r2c=True, dct=False)
Run R2C tests
----------------------------------------------------------------------
Traceback (most recent call last):
  File "E:\Python\Python39\lib\site-packages\pyvkfft\test\test_fft.py", line 204, in run_fft
    res = test_accuracy(backend, sh, ndim, axes, dtype, inplace,
  File "E:\Python\Python39\lib\site-packages\pyvkfft\accuracy.py", line 431, in test_accuracy
    else:
  File "E:\Python\Python39\lib\site-packages\pyvkfft\accuracy.py", line 147, in l2
    return np.sqrt((abs(a - b) ** 2).sum() / (abs(a) ** 2).sum())
ValueError: operands could not be broadcast together with shapes (30,30) (60,14)

vincefn commented 1 year ago

I found that in the failed R2C tests, for some unknown reason, the shape changed:

The change of shape is normal for inplace R2C transforms, but only along the last axis. The other axis should not change.

I just checked the pip-installed current version of pyvkfft and there are no issues with 2D R2C in single precision.

You need to give more details about your GPU and the exact software versions you are using

vincefn commented 1 year ago

To be more specific, if you start from a real array of shape (55, 30):

if you perform an out-of-place transform, you get a (half-hermitian) complex array of shape (55, 16)
if you perform an in-place transform, then the last two columns of the original array are ignored and you get the the half-hermitian transform of the (55, 28) array (without these two columns), and the result is an half-hermitian complex array of size (55,15)

TsXor commented 1 year ago

I found that in the failed R2C tests, for some unknown reason, the shape changed:

The change of shape is normal for inplace R2C transforms, but only along the last axis. The other axis should not change.

I just checked the pip-installed current version of pyvkfft and there are no issues with 2D R2C in single precision.

You need to give more details about your GPU and the exact software versions you are using

GTX 1050 Ti and R7 240 on windows 10, pyvkfft version: 2022.1.1, compiler:MSVC, vkFFT.h is modified to avoid error (both using the modified version), both using pyopencl backend, R7 240 fails while GTX 1050 Ti does not. I build binary wheel on GTX 1050Ti machine, and installed it on both machines.

I also tried it on the integrated UHD630 (STILL ALL THE SAME PRESET), and it works fine.

TsXor commented 1 year ago

I found that in the failed R2C tests, for some unknown reason, the shape changed:

The change of shape is normal for inplace R2C transforms, but only along the last axis. The other axis should not change.

I just checked the pip-installed current version of pyvkfft and there are no issues with 2D R2C in single precision.

You need to give more details about your GPU and the exact software versions you are using

Also, I found that in the latest version of https://raw.githubusercontent.com/DTolm/VkFFT/master/vkFFT/vkFFT.h, the line

#pragma OPENCL EXTENSION cl_khr_int64 : enable\n\

which caused errors has been deleted.

vincefn commented 1 year ago

Thanks for the details. Interesting that the same wheel fails on R7 240 and works onGTX 1050. I could understand a simple failure as the compute capabilities can be different, but I am puzzled by the error you are reporting on the size of the arrays. But maybe that's just a secondary error after another error.

@Dtolm do you have any idea what could go wrong ? Should the R7 240 work or are there know limitations ?

@TsXor you may want to try compiling the wheel on the R7 machine - just in case there is a subtle difference in the on-the-fly opencl kernel compile environement.

DTolm commented 1 year ago

Hello,

I have no idea what can go wrong with R7. It is quite an old GPU so something can be related to drivers. @TsXor can you post the latest VkFFT version results (there both LUT and nonLUT precision for 808x808 system should be the same)? Also, can you try compiling the VkFFT main repo with Vulkan backend and send the output of ./Vulkan_FFT -vkfft 15 ?

TsXor commented 1 year ago

I found something else: I tried fft of reikna on both machine, GTX 1050 Ti is normal, but R7 240 throws warning while compiling CL code: reikna_warning.log and after searching around, it seems to be the problem of old AMD GPUs: https://github.com/xmrig/xmrig/issues/1554#issuecomment-586700387

DTolm commented 1 year ago

I tried fft of reikna on both machine, GTX 1050 Ti is normal, but R7 240 throws warning while compiling CL code:

I am pretty sure these warnings are related to pyopencl, not VkFFT.

vincefn commented 1 year ago

The warnings are just generic opencl compilation warnings, they could be harmless.

So it's indeed likely the issue is due to old GPU or driver (>8 years old), since we know it works on more recent GPUs. A bit strange that it works on most transforms but not 2D R2C though.

At least good that the default automated tests caught that !

TsXor commented 1 year ago

Hello,

I have no idea what can go wrong with R7. It is quite an old GPU so something can be related to drivers. @TsXor can you post the latest VkFFT version results (there both LUT and nonLUT precision for 808x808 system should be the same)? Also, can you try compiling the VkFFT main repo with Vulkan backend and send the output of ./Vulkan_FFT -vkfft 15 ?

r7_vk15.log GTX_vk15.log r7_R2C.txt

TsXor commented 1 year ago

The warnings are just generic opencl compilation warnings, they could be harmless.

So it's indeed likely the issue is due to old GPU or driver (>8 years old), since we know it works on more recent GPUs. A bit strange that it works on most transforms but not 2D R2C though.

At least good that the default automated tests caught that !

In fact this issue originates from my L0-Smoothing project. I tried it on the R7, and found output was not as expected (huge difference from cpu cauculated result), then I ran tests and tried to find out where was wrong.

And the driver version is 21.09.03.08-210511a-368456C-RadeonSoftware, I think that is relatively new.

DTolm commented 1 year ago

Hello, I have no idea what can go wrong with R7. It is quite an old GPU so something can be related to drivers. @TsXor can you post the latest VkFFT version results (there both LUT and nonLUT precision for 808x808 system should be the same)? Also, can you try compiling the VkFFT main repo with Vulkan backend and send the output of ./Vulkan_FFT -vkfft 15 ?

r7_vk15.log GTX_vk15.log r7_R2C.txt

In Vulkan, 2D R2C + C2R work on R7 without any issues, which is good news. Can you also try OpenCL backend, by setting VKFFT_BACKEND to 3 in CMakeLists?

TsXor commented 1 year ago

Hello, I have no idea what can go wrong with R7. It is quite an old GPU so something can be related to drivers. @TsXor can you post the latest VkFFT version results (there both LUT and nonLUT precision for 808x808 system should be the same)? Also, can you try compiling the VkFFT main repo with Vulkan backend and send the output of ./Vulkan_FFT -vkfft 15 ?

r7_vk15.log GTX_vk15.log r7_R2C.txt

In Vulkan, 2D R2C + C2R work on R7 without any issues, which is good news. Can you also try OpenCL backend, by setting VKFFT_BACKEND to 3 in CMakeLists?

In fact I am not compiling it because I don't know how to... I just used the release... And thank god that there's OpenCL version in release.

DTolm commented 1 year ago

In fact I am not compiling it because I don't know how to... I just used the release... And thank god that there's OpenCL version in release.

There is an OpenCL release: VkFFT_OpenCL_release_v1.2.30.zip

TsXor commented 1 year ago

In fact I am not compiling it because I don't know how to... I just used the release... And thank god that there's OpenCL version in release.

There is an OpenCL release: VkFFT_OpenCL_release_v1.2.30.zip

... I am somehow distracted these days, and just now I found that the last log I posted was on OpenCL version...

and here is the log for vulkan version: R7_Vulkan_15.log

DTolm commented 1 year ago

and here is the log for vulkan version:

I am confused, haven't you posted Vulkan benchmark results before? The last benchmark you sent also worked in 2D except that it crashed on a big system that takes almost 2 GB of memory.

TsXor commented 1 year ago

and here is the log for vulkan version:

I am confused, haven't you posted Vulkan benchmark results before? The last benchmark you sent also worked in 2D except that it crashed on a big system that takes almost 2 GB of memory.

I found the last result I got was using OpenCL version🤣 I mean, r7_vk15.log is OpenCL log. ps: R7 240 has only 2GB vmem

DTolm commented 1 year ago

ps: R7 240 has only 2GB vmem

That's what I meant about the possible reason why it failed. But still, It passes tests that fail in pyvkfft OpenCL. Maybe it is related to some synchronization specifics in OpenCL that I am not familiar with.

TsXor commented 1 year ago

Something off-topic: I finally found the problem in my program: I passed a pyopencl boolean array to kernel function and wrote like this in the kernel function:

    __global bool * arr_mask,

This is how my bug got hardware-dependent: boolean of opencl have different length on different hardwares, and boolean array of pyopencl is stored as np.int8. So I converted the array to np.int at python side and receive array as int in kernel function, and my bug got solved.

But it's my program, which have nothing to do with the test suite, so this is off-topic.

vincefn commented 1 year ago

I finally managed to reproduce the issue: ValueError: operands could not be broadcast together with shapes (30,30) (60,14)

As I already mentioned, the tests run fine on macOS (M1 with opencl) and linux (all backends), but the opencl 2D R2C tests do fail on windows (on my GTX 1080, so it's not a GPU age issue). (not sure about cuda, my windows pycuda install is not working at the moment)

This is rather strange, as there is almost no platform-dependent code. And it's not a pyopencl version issue.

vincefn commented 1 year ago

Hmm, correction: this seems a little more mundane - the error occurs when scipy is not installed (regardless of the platform), so that numpy r2c transforms are used instead of scipy's.

vincefn commented 1 year ago

And the reason is indeed the different behaviour between numpy and scipy rfftn:

In [5]: from numpy.fft import rfftn, irfftn

In [6]: a=np.zeros((30,10))

In [7]: b=rfftn(a)

In [8]: a.shape,a.strides
Out[8]: ((30, 10), (80, 8))

In [9]: b.shape,b.strides
Out[9]: ((30, 6), (16, 480))

In [10]: from scipy.fft import rfftn, irfftn

In [11]: b=rfftn(a)

In [12]: b.shape,b.strides
Out[12]: ((30, 6), (96, 16))

So after a (2D) scipy rfftn, the fast axis remains the last dimension, but for numpy, the fast axis changes... And using simply pyopencl.array.Array.view() will change the number of elements along the fast axis.

This leads to:

we need to check that the view() after an r2c transform changes the last dimension only
more importantly, the last axis must be the fast axis on input, or the transform won't be computed as we expect ?

Are there cases where we would want the last axis not to be the fast axis ? Right now pyvkfft assumes that the last axis is the fast axis.

vincefn / pyvkfft

All R2C test that have ndim=2 fails [numpy], because the fast axis is not always the last one #19