mrirecon / bart

BART: Toolbox for Computational Magnetic Resonance Imaging
https://mrirecon.github.io/bart/
BSD 3-Clause "New" or "Revised" License
291 stars 161 forks source link

CG-SENSE slower on GPU than CPU #292

Closed jmontalt closed 2 years ago

jmontalt commented 2 years ago

I'm trying to do a CG-SENSE reconstruction using the brain dataset from the ISMRM Reproducibility Challenge 1.

I am doing the reconstruction with the following command:

pics -i 10 -t <trajectory> -p <weights> <kspace> <sensitivities> <output>

The reconstruction works well and I get the following printouts:

[  1 49152   1  12   1   1   1   1   1   1   1   1   1   1   1   1 ]
[300 300   1  12   1   1   1   1   1   1   1   1   1   1   1   1 ]
Regularization terms: 0, Supporting variables: 0
conjugate gradients
Total Time: 1.018081

Then I've also tried running the reconstruction on the GPU by adding the -g option, which gives me the following:

GPU reconstruction
[  1 49152   1  12   1   1   1   1   1   1   1   1   1   1   1   1 ]
[300 300   1  12   1   1   1   1   1   1   1   1   1   1   1   1 ]
Regularization terms: 0, Supporting variables: 0
conjugate gradients
Total Time: 1.161055

I was expecting to see some speed-up due to the use of a GPU. But, as you can see, the recon time is very similar and indeed slightly slower on the GPU.

BART was compiled with the command make CUDA=1 CUDA_BASE=/usr/local/cuda CUDA_LIB=lib64 in Ubuntu.

The commit SHA is 50905f7f134dcc5a9a396ddccb865ac1db43a61d (June 20th, 2022).

The hardware is an Intel Core i7-11800H and an NVIDIA GeForce RTX 3080 (laptop).

I should note that I do observe a speed-up in other reconstructions. For example, a GRASP reconstruction takes 30 s on the CPU but 7 s on the GPU.

Is this indicative of a problem with the PICS GPU recon (either in BART code or in my configuration)? Or would you expect these results?

hcmh commented 2 years ago

In principle, this is to be expected for short runtimes. A way to understand it: let's assume that the reconstruction takes no time at all, then there will still be extra work involved in getting data to and from the GPU, while there are no extra steps for the CPU.

Additionally, some systems are set up in such a way that there are certain CUDA-specific steps that have to be done for each CUDA program. These also add overhead to anything using the GPU for computation.

jmontalt commented 2 years ago

I understand, thank you very much for your reply!