Torch example: "requires_grad=True" and CUDA out of memory

gschramm / parallelproj

code for parallel TOF and NONTOF projections

MIT License

27 stars 8 forks source link

Torch example: "requires_grad=True" and CUDA out of memory #67

Closed lhellfh closed 4 months ago

lhellfh commented 4 months ago

Hello!

In the current version of examples/07_torch/01_run_projectionlayer.py the error "ValueError: gradcheck expects at least one input tensor to require gradient, but none of the them have requiresgrad=True." occurs when a ListModePETProjector is used instead of RegularPolygonPETProjector.

Is this behavior to be expected? Thank you in advance for your answer.

My environment (Anaconda): Python 3.9 Pytorch 2.2.1 Pytorch CUDA 11.8 Installation of parallelproj via "conda install -c conda-forge parallelproj"

gschramm commented 4 months ago

Hm. I guess that is not related to the projector. Did you make sure that "requires_grad = True" when you setup the input tensor for the grad check? As done here: https://github.com/gschramm/parallelproj/blob/d38c56153c7c9859567a1794c9f87b68e2b812cb/examples/07_torch/01_run_projection_layer.py#L257

lhellfh commented 4 months ago

Hello again,

yes, I have been able to fix the error with adding "img.requires_grad = True" to the input image. Thanks!

However now I am running into memory issues for the torch.autograd.gradcheck() commands. They seem to consume a lot of memory. What parameter of the projector/geometry/images could reduce the memory consumption? So far, I have tried reducing image and detector size without noticable differences. I am working with a 12GB GPU and it is constantly failing to execute the gradcheck due to torch.cuda.OutOfMemoryError: CUDA out of memory.

Thanks in advance :-)

gschramm commented 4 months ago

Reducing the number of elements in:

the image (input to the forward model)
the "forward projection" of the image (output of the forward model) will decrease the memory requirement.

If you want to test whether your custom pytorch layers are implemented correctly, I recommend doing that on a minimal image / sinogram. The implementation of these layers should be independent of the actual input/output sizes of the used custom linear operator.

Note that available GPU-memory is always a concern when training recon-networks with realistic 3D PET data.

lhellfh commented 4 months ago

Thank you for the suggestion. I am now checking the gradients with very few LORs and it is working when I am adjusting the tolerances atol and rtol a bit. Best regards!