researchmm / FTVSR

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
MIT License
159 stars 13 forks source link

RuntimeError: CUDA out of memory. #22

Open onurbarut opened 1 year ago

onurbarut commented 1 year ago

Hi,

I've tested single gpu test in two different machines, one with a GPU 8GB, another is with A10 24 GB. Both gave me oom error even with num_input_frames = 2. What am I missing?

First one with RTX said:

Tried to allocate 620.00 MiB (GPU 0; 7.80 GiB total capacity; 4.31 GiB already allocated; 412.31 MiB free; 6.33 GiB reserved in total by PyTorch)

while on another machine with A10 said:

Tried to allocate 5.50 GiB (GPU 0; 22.02 GiB total capacity; 10.77 GiB already allocated; 4.78 GiB free; 15.63 GiB reserved in total by PyTorch)

I changed REDS dataset to SRFolderMultipleGTDataset and I've a 2 video subset of REDS4_val which looks like

-REDS4_short/ |-- val_sharp/ |--|-- 000/ |--|--|-- %08d.png |--|-- 001/ |--|--|-- %08d.png |-- val_sharp_bicubic/ |--|-- X4/ |--|--|-- 000/ |--|--|--|-- %08d.png |--|--|-- 001/ |--|--|--|-- %08d.png

And I've used the following command: tools/test.py <path/to/config> <path/to/redsModel>--crf 25 --startIdx 0 --test_frames 50

Let me know if you need any further info to help. Thanks!

onurbarut commented 1 year ago

After debugginf for a while, I've noticed that SPyNet compute_flow() method for the HR frames are failing due to memory. My bet is that for RTX 4000, it's failing at LR frames earlier than HR, so it gives a different amount of memory to allocate.

Looks like SPyNet uses all the frames set by --test_frames parameter to compute the flow, that's where we see OOM. I've set --test_frames 10 and it worked well on A10 (24G). My question here is that, is this behaviour of SPyNet is normal? How to run a video on 1000+ frames? I believe some batching approach should be taken because I'm not interested in first 10 frames, but all frames of the video. Thanks