mmuckley / torchkbnufft

A high-level, easy-to-deploy non-uniform Fast Fourier Transform in PyTorch.
https://torchkbnufft.readthedocs.io/
MIT License
201 stars 44 forks source link

Support for NFFTs of real valued Arrays #86

Open roflmaostc opened 1 year ago

roflmaostc commented 1 year ago

Hi,

as far as I can see, there is no support for real-valued input arrays?

However, with rfft we could save a factor of 2 on computation for those cases by using rfft.

Are there some issues with that?

Best,

Felix

mmuckley commented 1 year ago

Hello @roflmaostc, it wasn't a priority for me since MRI data is always complex. Today I'm unable to implement it due to time constraints, but I could possibly consider a PR if someone else did it.

roflmaostc commented 1 year ago

I can try to help since I would love that feature! Maybe specified with another parameter? Doing that implicitly might be dangerous. fftmethod="fft" or fftmethoid="rfft".

What is requried?

Would that be here? https://github.com/mmuckley/torchkbnufft/blob/20c45a23b4d2894fd118904131129d51104cd64f/torchkbnufft/_nufft/fft.py#L9

There is probably more steps to do with the interpolation?

mmuckley commented 1 year ago

I think the interpolator could be a little bit involved. First you'd want to check that it has no imaginary values (I don't remember). Then there might be a question of reflecting the results to the Hermitian-symmetric part of Fourier space so that the k-nearest neighbors algorithm works correctly.

After doing these things I'm actually guessing there would be no speedup, because the interpolator is the bottleneck anyway. Maybe a deeper look into the NUFFT algorithm to consider real inputs would be necessary to get an efficient implementation. I don't think it's as simple as changing the forward FFT op.

roflmaostc commented 1 year ago

What is the estimated overhead in your case? Is that interpolator that slow? I mean, normal cubic interpolation is much faster hence I wonder what's so different in this case. Is that better in different libraries?

I observed the following.

Does that match with your experiences? So NUFFT along the last two dims, and a leading batch dim.

voxels = torch.zeros(100,1, 768,768, device="cuda")+ 0j
# 300 angles
nufft, adjnufft, ktraj = prepare_nufft(voxels, 300, numpoints=3, gridfactor=1.25, device="cuda")

%%timeit
torch.fft.fft2(voxels +0j)
torch.cuda.synchronize()
# 9ms

%%timeit
nufft(voxels + 0j, ktraj)
torch.cuda.synchronize()
# 61ms

In that case, a RFFT really would only save ~4 to 5ms. (maybe a little more because of 1.25 padding)

mmuckley commented 1 year ago

85% overhead seems a little high but could be possible for some trajectories. It's actually trajectory-dependent. If you only have one radial spoke then it's possible the FFT takes longer.

roflmaostc commented 1 year ago

I actually used 300 radial spokes of length 768 each.

And a batch size of 100 maybe, in max.

What should be the shape of ktraj in this case? On my case it was (2, 768*300).

Tbh, in the documentation I was confused about coils, etc.

Should I rearrange the spokes along another dim for more parallelization?