sony / nnabla-examples

Neural Network Libraries https://nnabla.org/ - Examples
Apache License 2.0
306 stars 93 forks source link

OOM for TECO GAN #224

Open stalagmite7 opened 3 years ago

stalagmite7 commented 3 years ago

Seems like using even a height of 360 (whicle maintaining aspect ratio) for tecogan gives runtime OOM errors; whats the largest size possible that I can use to try to upscale to 4k? I imagine if I want to upscale to 4k, I would use 1080p as the resolution for my input but its too big for the GPU to handle; if there a way to use only CPU for this?

TakuyaNarihira commented 3 years ago

Thanks for reporting.

It's probably because the clear_buffer option in forward() method is not specified in the following code block. https://github.com/sony/nnabla-examples/blob/master/GANs/tecogan/generate.py#L83-L85

With .forward(clear_buffer=True), it will aggressively release unused memory in the network.

Could you try this quickly?

            pre_gen_warp.forward(clear_buffer=True)
            pre_warp.data.copy_from(pre_gen_warp.data)
        outputs.forward(clear_buffer=True)

We'll also see if it works properly and reduces memory usage later soon.

stalagmite7 commented 3 years ago

Thanks for the quick response! I just got AFK, I’ll try it in a few hours and keep you posted!

stalagmite7 commented 3 years ago

Tried this, got a invalid configuration error from CUDA

Error during forward propagation:
  TransposeCuda <-- ERROR
Traceback (most recent call last):
  File "generate.py", line 105, in <module>
    main()
  File "generate.py", line 84, in main
    pre_gen_warp.forward(clear_buffer=True)
  File "_variable.pyx", line 564, in nnabla._variable.Variable.forward
RuntimeError: target_specific error in forward_impl
/home/gitlab-runner/builds/zxvvzZDJ/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/function/./generic/transpose.cu:184
(cudaGetLastError()) failed with "invalid configuration argument" (cudaErrorInvalidConfiguration).

Cursory checking looks like it could be a number of blocks error from CUDA. Will need to dig in further on my end later today.

TakuyaNarihira commented 3 years ago

Looks it exceeds the limitation of the number of blocks. We should introduce the grid-strided loop in CUDA kernel. I created a issue in sony/nnabla-ext-cuda#321 (Let's continue there on this specific matter).

Btw, how long is your input video sequence?

stalagmite7 commented 2 years ago

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

Srinidhi-Srinivasa commented 2 years ago

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7, is it possible to share more information about computation environment?

Srinidhi-Srinivasa commented 2 years ago

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7 Following are approximate memory requirements to run TeCoGAN:

Resolution | Peak Memory Usage (in MB) -- | -- 144p | 708 280p | 2816 360p | 4074 480p | 6818

Please note that it may not be possible to run TeCoGAN on any resolution higher than this on GPUs which have upto 32 GB of memory.

Current pre-trained weights are in NHWC (channel last) format which is not supported in CPU version. However, it is indeed possible to run inference on CPU-only by transposing weights into NCHW format and setting "channel_last" flag to "False" in PF.conv functions. Following are reference codes for that: Memory-Layout-Conversion convert_parameter_format.py

stalagmite7 commented 2 years ago

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

Srinidhi-Srinivasa commented 2 years ago

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

Yes.