Open stalagmite7 opened 3 years ago
Thanks for reporting.
It's probably because the clear_buffer
option in forward()
method is not specified in the following code block.
https://github.com/sony/nnabla-examples/blob/master/GANs/tecogan/generate.py#L83-L85
With .forward(clear_buffer=True)
, it will aggressively release unused memory in the network.
Could you try this quickly?
pre_gen_warp.forward(clear_buffer=True)
pre_warp.data.copy_from(pre_gen_warp.data)
outputs.forward(clear_buffer=True)
We'll also see if it works properly and reduces memory usage later soon.
Thanks for the quick response! I just got AFK, I’ll try it in a few hours and keep you posted!
Tried this, got a invalid configuration error from CUDA
Error during forward propagation:
TransposeCuda <-- ERROR
Traceback (most recent call last):
File "generate.py", line 105, in <module>
main()
File "generate.py", line 84, in main
pre_gen_warp.forward(clear_buffer=True)
File "_variable.pyx", line 564, in nnabla._variable.Variable.forward
RuntimeError: target_specific error in forward_impl
/home/gitlab-runner/builds/zxvvzZDJ/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/function/./generic/transpose.cu:184
(cudaGetLastError()) failed with "invalid configuration argument" (cudaErrorInvalidConfiguration).
Cursory checking looks like it could be a number of blocks error from CUDA. Will need to dig in further on my end later today.
Looks it exceeds the limitation of the number of blocks. We should introduce the grid-strided loop in CUDA kernel. I created a issue in sony/nnabla-ext-cuda#321 (Let's continue there on this specific matter).
Btw, how long is your input video sequence?
Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?
Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?
@stalagmite7, is it possible to share more information about computation environment?
Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?
@stalagmite7 Following are approximate memory requirements to run TeCoGAN:
Resolution | Peak Memory Usage (in MB) -- | -- 144p | 708 280p | 2816 360p | 4074 480p | 6818Please note that it may not be possible to run TeCoGAN on any resolution higher than this on GPUs which have upto 32 GB of memory.
Current pre-trained weights are in NHWC (channel last) format which is not supported in CPU version. However, it is indeed possible to run inference on CPU-only by transposing weights into NCHW format and setting "channel_last" flag to "False" in PF.conv functions. Following are reference codes for that: Memory-Layout-Conversion convert_parameter_format.py
Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?
Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?
Yes.
Seems like using even a height of 360 (whicle maintaining aspect ratio) for tecogan gives runtime OOM errors; whats the largest size possible that I can use to try to upscale to 4k? I imagine if I want to upscale to 4k, I would use 1080p as the resolution for my input but its too big for the GPU to handle; if there a way to use only CPU for this?