vincefn / pyvkfft

Python interface to VkFFT
MIT License
51 stars 6 forks source link

pyvkfft.fft.fftn error? #34

Closed mDiracYas closed 1 month ago

mDiracYas commented 4 months ago

Hi! I have found that pyvkfft.fft.fftn exhibits strange behavior. Also, in previous versions of pyvkfft, this code raises an error.

OS:Windows 10 Python: 3.12.2

import numpy as np
import cupy as cp
import pyvkfft.fft

shape = (3,2,2)

size = np.prod(shape)
x1 = cp.arange(size,dtype=cp.complex64).reshape(shape)
x2 = x1[:,None]

y1 = pyvkfft.fft.fftn(x1,axes=(-1,-2))
y2 = pyvkfft.fft.fftn(x2,axes=(-1,-2)).squeeze()

print("shape x1:",x1.shape, "shape x2:", x2.shape)
print("shape y1:",y1.shape, "shape y2:", y2.shape)
print(y1-y2)

case (i) pyvkfft == 2024.1.1 vkfft == 1.3.4

shape x1: (3, 2, 2) shape x2: (3, 1, 2, 2)
shape y1: (3, 2, 2) shape y2: (3, 2, 2)
[[[ -9.+0.j          1.+0.j       ]
  [  2.-3.4641016j   0.+0.j       ]]

 [[ 28.+3.4641016j  -2.+0.j       ]
  [-55.+0.j          3.+0.j       ]]

 [[ 44.-3.4641016j  -2.+0.j       ]
  [  2.+3.4641016j   0.+0.j       ]]]

case (ii) pyvkfft == 2023.1.1 vkfft == 1.2.9

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[2], line 8
      5 x2 = x1[:,None]
      7 y1 = pyvkfft.fft.fftn(x1,axes=(-1,-2))
----> 8 y2 = pyvkfft.fft.fftn(x2,axes=(-1,-2)).squeeze()
     10 print("shape x1:",x1.shape, "shape x2:", x2.shape)
     11 print("shape y1:",y1.shape, "shape y2:", y2.shape)

File [~\miniconda3\Lib\site-packages\pyvkfft\fft.py:200](http://localhost:8888/lab/tree/cupy_test/vkfft_error/~/miniconda3/Lib/site-packages/pyvkfft/fft.py#line=199), in fftn(src, dest, ndim, norm, axes, cuda_stream, cl_queue, return_scale)
    167 """
    168 Perform a FFT on a GPU array, automatically creating the VkFFTApp
    169 and caching it for future re-use.
   (...)
    197 :return: the destination array if return_scale is False, or (dest, scale)
    198 """
    199 backend, inplace, dest, cl_queue = _prepare_transform(src, dest, cl_queue, False)
--> 200 app = _get_fft_app(backend, src.shape, src.dtype, inplace, ndim, axes, norm, cuda_stream, cl_queue,
    201                    strides=src.strides)
    202 if backend == Backend.PYOPENCL:
    203     app.fft(src, dest, queue=cl_queue)

File [~\miniconda3\Lib\site-packages\pyvkfft\fft.py:137](http://localhost:8888/lab/tree/cupy_test/vkfft_error/~/miniconda3/Lib/site-packages/pyvkfft/fft.py#line=136), in _get_fft_app(backend, shape, dtype, inplace, ndim, axes, norm, cuda_stream, cl_queue, strides)
    134 @lru_cache(maxsize=config.FFT_CACHE_NB)
    135 def _get_fft_app(backend, shape, dtype, inplace, ndim, axes, norm, cuda_stream, cl_queue, strides=None):
    136     if backend in [Backend.PYCUDA, Backend.CUPY]:
--> 137         return VkFFTApp_cuda(shape, dtype, ndim=ndim, inplace=inplace,
    138                              stream=cuda_stream, norm=norm, axes=axes, strides=strides)
    139     elif backend == Backend.PYOPENCL:
    140         return VkFFTApp_cl(shape, dtype, cl_queue, ndim=ndim, inplace=inplace,
    141                            norm=norm, axes=axes, strides=strides)

File [~\miniconda3\Lib\site-packages\pyvkfft\cuda.py:123](http://localhost:8888/lab/tree/cupy_test/vkfft_error/~/miniconda3/Lib/site-packages/pyvkfft/cuda.py#line=122), in VkFFTApp.__init__(self, shape, dtype, ndim, inplace, stream, norm, r2c, dct, axes, strides, **kwargs)
    121     raise RuntimeError("Error creating VkFFTConfiguration. Was the CUDA context properly initialised ?")
    122 res = ctypes.c_int(0)
--> 123 self.app = _vkfft_cuda.init_app(self.config, ctypes.byref(res))
    124 check_vkfft_result(res, shape, dtype, ndim, inplace, norm, r2c, dct, axes, "cuda")
    125 if self.app is None:

OSError: exception: integer divide by zero
vincefn commented 4 months ago

Thanks for the report, I can confirm this behaviour also with OpenCL:

import numpy as np
import pyopencl as cl
import pyopencl.array as cla
import pyvkfft.fft
from pyvkfft.base import calc_transform_axes

ctx = cl.create_some_context()
clq = cl.CommandQueue(ctx)

shape = (3,2,2)

size = np.prod(shape)
x0 = np.arange(size,dtype=np.complex64).reshape(shape)
x1 = cla.to_device(clq, x0)
x2 = cla.to_device(clq, x0[:,None])

y1 = pyvkfft.fft.fftn(x1,axes=(-1,-2)).get()
y2 = pyvkfft.fft.fftn(x2,axes=(-1,-2)).get() #.squeeze()

y0 = np.fft.fftn(x0, axes=(-1,-2))

print("shape x1:",x1.shape, "shape x2:", x2.shape)
print("shape y1:",y1.shape, "shape y2:", y2.shape)
print(abs(y1-y0).sum())
print(abs(y2.squeeze()-y0).sum())

which gives:

shape x1: (3, 2, 2) shape x2: (3, 1, 2, 2)
shape y1: (3, 2, 2) shape y2: (3, 1, 2, 2)
0.0
152.34962482055641

Interestingly the initial re-shuffling of the internal array shape passed to VkFFT (collapsing non-transformed arrays) gives the same results:

from pyvkfft.base import calc_transform_axes

print(calc_transform_axes((3,2,2), axes=(-1,-2)))
print(calc_transform_axes((3,1,2,2), axes=(-1,-2)))
([2, 2, 1], 3, [False, False, True], 2, (-1, -2))
([2, 2, 1], 3, [False, False, True], 2, (-1, -2))

So something else is going wrong...

vincefn commented 4 months ago

Hmm, I think I understand what is happening - for an axis of size 1 the stride is zero:

print(x1.strides)
print(x2.strides)
(72, 24, 8)
(72, 0, 24, 8)

and pyvkfft will then assume the internal transform order must be changed, and this is where the error occurs:

print(calc_transform_axes((3,2,2), axes=(-1,-2), strides=x1.strides))
print(calc_transform_axes((3,1,2,2), axes=(-1,-2), strides=x2.strides))
([2, 2, 1], 3, [False, False, True], 2, (-1, -2))
([2, 1, 3, 1], 2, [False, True, False, True], 3, [-3, -1])

now I need to understand what is going wrong - probably the easiest solution is to ignore any 1-sized (0-strided) axis

Some background info about why an axis of size 1 has a stride=0: see https://github.com/numpy/numpy/issues/22950

vincefn commented 4 months ago

OK, this should be fixed in the current git master.

The main issue was not, in fact, strides equal to zero (that's easily handled). And it is not possible to just ignore (squeeze) axes of length 1, as R2C transforms will give different results due to the handling of the fast axis... Which is surprisingly messy to handle, e.g. when considering arrays where only one axis has a length>1.