hmorimitsu / ptlflow

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.
Apache License 2.0
250 stars 33 forks source link

FlowFormer validation out of memory #69

Closed LucaLuca1124 closed 2 months ago

LucaLuca1124 commented 2 months ago

Hi and thanks for your work, this is making my and my fellow student's life a lot easier. Concerning the issue, we want to run the validation for FlowFormer on KITTI-2015 with an 80 gb GPU, but run into a CUDA OOM error during the prediction on the second image pair. We have alt_cuda_corr installed and using autocast does not help much either. Do happen to have any suggestions on how we could get this to work?

hmorimitsu commented 2 months ago

Hi. This is strange, the KITTI image size should not be big enough to cause OOM errors, especially with 80GB. Does this only happen to FlowFormer?

Are you sure the input images are correct? I think you could try to print the sizes of the inputs just before giving them to the model, just to make sure the input shape is correct.

Another option is to use the infer.py script and manually give the paths to the images you think are causing the OOM problem in the validation and see if it also happens in the inference.

If the error still persists, then please provide me the exact commands you are typing and what are the complete errors you are getting.

Best.

LucaLuca1124 commented 2 months ago

Thank you for your quick reply! I just found the problem: I had modified your code a bit to get the gradients of the predictions by using @torch.enable_grad() before the validate_one_dataloader function. This somehow leads to extremely high memory usage.

hmorimitsu commented 2 months ago

Glad you found out the problem.