Open roflmaostc opened 1 year ago
Hello @roflmaostc, it wasn't a priority for me since MRI data is always complex. Today I'm unable to implement it due to time constraints, but I could possibly consider a PR if someone else did it.
I can try to help since I would love that feature! Maybe specified with another parameter?
Doing that implicitly might be dangerous. fftmethod="fft"
or fftmethoid="rfft"
.
What is requried?
Would that be here? https://github.com/mmuckley/torchkbnufft/blob/20c45a23b4d2894fd118904131129d51104cd64f/torchkbnufft/_nufft/fft.py#L9
There is probably more steps to do with the interpolation?
I think the interpolator could be a little bit involved. First you'd want to check that it has no imaginary values (I don't remember). Then there might be a question of reflecting the results to the Hermitian-symmetric part of Fourier space so that the k-nearest neighbors algorithm works correctly.
After doing these things I'm actually guessing there would be no speedup, because the interpolator is the bottleneck anyway. Maybe a deeper look into the NUFFT algorithm to consider real inputs would be necessary to get an efficient implementation. I don't think it's as simple as changing the forward FFT op.
What is the estimated overhead in your case? Is that interpolator that slow? I mean, normal cubic interpolation is much faster hence I wonder what's so different in this case. Is that better in different libraries?
I observed the following.
Does that match with your experiences? So NUFFT along the last two dims, and a leading batch dim.
voxels = torch.zeros(100,1, 768,768, device="cuda")+ 0j
# 300 angles
nufft, adjnufft, ktraj = prepare_nufft(voxels, 300, numpoints=3, gridfactor=1.25, device="cuda")
%%timeit
torch.fft.fft2(voxels +0j)
torch.cuda.synchronize()
# 9ms
%%timeit
nufft(voxels + 0j, ktraj)
torch.cuda.synchronize()
# 61ms
In that case, a RFFT really would only save ~4 to 5ms. (maybe a little more because of 1.25 padding)
85% overhead seems a little high but could be possible for some trajectories. It's actually trajectory-dependent. If you only have one radial spoke then it's possible the FFT takes longer.
I actually used 300 radial spokes of length 768 each.
And a batch size of 100 maybe, in max.
What should be the shape of ktraj in this case?
On my case it was (2, 768*300)
.
Tbh, in the documentation I was confused about coils, etc.
Should I rearrange the spokes along another dim for more parallelization?
Hi,
as far as I can see, there is no support for real-valued input arrays?
However, with
rfft
we could save a factor of 2 on computation for those cases by usingrfft
.Are there some issues with that?
Best,
Felix