Closed WeijieXie closed 4 months ago
Thank you, @WeijieXie, for this suggestion. Is there any chance you could give me an idea on speed up in this case? Perhaps, you may want to try these two using various matrix sizes (i.e., 1920 by 1080) over a number of trial (i.e, 100) to get a sense.
Thanks for the reviewing, Kaan
import time
import torch
def cpu_timer(operation,reapeat=100): totaltime = 0 for in range(reapeat): start_time = time.perf_counter() operation() end_time = time.perf_counter() total_time += end_time - start_time return total_time/reapeat
def gpu_timer(operation,reapeat=100): totaltime = 0 for in range(reapeat): torch.cuda.synchronize() start_time = torch.cuda.Event(enable_timing=True) end_time = torch.cuda.Event(enable_timing=True)
start_time.record()
operation()
end_time.record()
torch.cuda.synchronize()
total_time += start_time.elapsed_time(end_time)
return total_time/reapeat
def original_code(field_padded, H, aperture): U1 = torch.fft.fftshift(torch.fft.fft2(torch.fft.fftshift(field_padded))) U2 = H aperture U1 return torch.fft.ifftshift(torch.fft.ifft2(torch.fft.ifftshift(U2)))
def modified_code(field_padded, H, aperture): U1 = torch.fft.fftshift(torch.fft.fft2(field_padded)) U2 = H aperture U1 return torch.fft.ifft2(torch.fft.ifftshift(U2))
field_padded = torch.rand(1920,1080) H =torch.rand(1920,1080) aperture = torch.rand(1920,1080)
original_time = cpu_timer(lambda: original_code(field_padded, H, aperture)) modified_time = cpu_timer(lambda: modified_code(field_padded, H, aperture)) print(f'Original code on CPU: {original_time:.8f} ms') print(f'Modified code on CPU: {modified_time:.8f} ms')
field_padded = field_padded.to('cuda') H = H.to('cuda') aperture = aperture.to('cuda')
original_time = gpu_timer(lambda: original_code(field_padded, H, aperture)) modified_time = gpu_timer(lambda: modified_code(field_padded, H, aperture)) print(f'Original code on GPU: {original_time:.8f} ms') print(f'Modified code on GPU: {modified_time:.8f} ms')
- on the cpu of 9750H and GPU of RTX1650, the result is as followed:
```bash
Original code on CPU: 0.03148508 ms
Modified code on CPU: 0.02354449 ms
Original code on GPU: 7.29974783 ms
Modified code on GPU: 4.84190688 ms
Both seems considerably faster
Original code on CPU: 0.05000954 ms
Modified code on CPU: 0.04343863 ms
Original code on GPU: 6.01557523 ms
Modified code on GPU: 2.32627329 ms
Thank you, @WeijieXie ! I have incorporated your changes in the repository. For now, users has to install odak
from the repository to benefit from your changes. But it will be reflected to the pip
version as I release odak==0.2.6
.
I have also included your name to our THANKS.txt
and CITATION.cff
, please have a visual check. If you encounter any missing information, please do not hesitate to let me know. I haven't added any ORCID number for your name in the CITATION.cff
, if there is any please let me know so that I can edit accordingly.
remove "torch.fft.fftshift" and "torch.fft.ifftshift" which may help accelerate the program
Validated by the following 2 tests
When the number of the sampling points is odd
When the number of the sampling points is even