pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.88k stars 22.61k forks source link

c2r fft input generation #69627

Open micmelesse opened 2 years ago

micmelesse commented 2 years ago

Issue

The input generated for some unit tests is not in the correct form which leads to testing undefined behavior. The undefined behavior occurs when the following tests invoke complex to real transforms (cufftExecC2R or hipfftExecC2R) with a random input​.

Here is a link to documentation stating that this is undefined behavior for cufft and rocfft​ cufft: https://docs.nvidia.com/cuda/cufft/index.html#fft-types​ rocfft: https://rocfft.readthedocs.io/en/rocm-4.5.0/library.html#transform-and-array-types

There was a workaround implemented for rocfft in the following prs (mainly #61073)​

We suggest that this workaround be extended to cufft. We are looking for insight from owners of the fft module in pytorch.

Workaround details

In the unit tests mentioned above, we check if we are on ROCM and call a function to generate a complex-Hermitian input.

https://github.com/pytorch/pytorch/blob/master/test/test_spectral_ops.py#L242-L244

Then the function checks the type of c2r op being invoked and then generates a complex-Hermitian tensor using the r2c version of that op. For example see the case of fft2

https://github.com/pytorch/pytorch/blob/3e6164449fe285b7c9c9e4f0df63b5f3ed8a3dc8/test/test_spectral_ops.py#L182-L205

Remarks

This was discussed with @malfet during our last engineering sync with me, @jithunnair-amd, @malcolmroberts and @jeffdaily.

cc @mruberry @peterbell10

malfet commented 2 years ago

I don't think we should limit it to just test, but actually c2r conversion should raise an exception if input tensor is not-hermitian

malcolmroberts commented 2 years ago

The test for Hermitian symmetry is sub-dimensional; for 1D transforms, one needs to check just two values, for 2D transforms, one needs to check two 1D arrays for conjugate symmetry, etc. This can be done effectively in python or on the gpu. Performance-wise, this won't be that bad, but there will still be a cost.

Does the real-to-complex FFT interface cover anything higher than 3D transforms?

mruberry commented 2 years ago

Thanks for reporting this issue, @micmelesse. We would accept a PR improving the sample inputs for these operations, and we would be willing to consider a PR that throws an error on invalid inputs but we would need to review the performance impact.