gschramm / parallelproj

code for parallel TOF and NONTOF projections
MIT License
27 stars 8 forks source link

Torch example: "requires_grad=True" and CUDA out of memory #67

Closed lhellfh closed 4 months ago

lhellfh commented 4 months ago

Hello!

In the current version of examples/07_torch/01_run_projectionlayer.py the error "ValueError: gradcheck expects at least one input tensor to require gradient, but none of the them have requiresgrad=True." occurs when a ListModePETProjector is used instead of RegularPolygonPETProjector.

Is this behavior to be expected? Thank you in advance for your answer.

My environment (Anaconda): Python 3.9 Pytorch 2.2.1 Pytorch CUDA 11.8 Installation of parallelproj via "conda install -c conda-forge parallelproj"

gschramm commented 4 months ago

Hm. I guess that is not related to the projector. Did you make sure that "requires_grad = True" when you setup the input tensor for the grad check? As done here: https://github.com/gschramm/parallelproj/blob/d38c56153c7c9859567a1794c9f87b68e2b812cb/examples/07_torch/01_run_projection_layer.py#L257

lhellfh commented 4 months ago

Hello again,

yes, I have been able to fix the error with adding "img.requires_grad = True" to the input image. Thanks!

However now I am running into memory issues for the torch.autograd.gradcheck() commands. They seem to consume a lot of memory. What parameter of the projector/geometry/images could reduce the memory consumption? So far, I have tried reducing image and detector size without noticable differences. I am working with a 12GB GPU and it is constantly failing to execute the gradcheck due to torch.cuda.OutOfMemoryError: CUDA out of memory.

Thanks in advance :-)

gschramm commented 4 months ago

Reducing the number of elements in:

If you want to test whether your custom pytorch layers are implemented correctly, I recommend doing that on a minimal image / sinogram. The implementation of these layers should be independent of the actual input/output sizes of the used custom linear operator.

Note that available GPU-memory is always a concern when training recon-networks with realistic 3D PET data.

lhellfh commented 4 months ago

Thank you for the suggestion. I am now checking the gradients with very few LORs and it is working when I am adjusting the tolerances atol and rtol a bit. Best regards!