Out of memory: RTX 4090 / 24GB

boxabirds commented 8 months ago

Hi no matter what movie size I choose -- 5fps, 640x480 I get this error below. nvtop shows that the webUI triggers pre-allocation of 21.5GB but then ... it's not used?

/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0%| | 0/15 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/queueing.py", line 388, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/route_utils.py", line 219, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1437, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1109, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/utils.py", line 650, in wrapper response = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 159, in process keypath = process1(*args) ^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 280, in process1 latents = inference(global_state.pipe, global_state.controlnet, global_state.frescoProc, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/pipe_FRESCO.py", line 201, in inference noise_pred = pipe.unet( ^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 776, in forward sample = optimize_feature(sample, flows, occs, correlation_matrix, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 485, in optimize_feature optimizer.step(closure) File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad ret = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/adam.py", line 143, in step loss = closure() ^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 478, in closure loss.backward() File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.06 GiB. GPU 0 has a total capacty of 23.65 GiB of which 525.94 MiB is free. Process 15481 has 1.27 GiB memory in use. Including non-PyTorch memory, this process has 21.00 GiB memory in use. Of the allocated memory 20.11 GiB is allocated by PyTorch, and 416.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

williamyang1991 commented 8 months ago

You can set a smaller batch size to avoid OOM. https://github.com/williamyang1991/FRESCO/blob/9fe1be71b6c21890b5bc92659026f9586440266e/config/config_music.yaml#L17

Our method will optimize the feature during DDPM, which reaches a peak memory usage when optimization is applied.

boxabirds commented 8 months ago

I tried with batch_size: 4 and then 2 and it made no difference 🤔

I don't think it's this: it's saying it's trying to allocate 112MB, GPU has capacity of 23.65GB but only 106MB is free. 20.85 GiB is allocated by Pytorch. But for what I wonder

boxabirds commented 8 months ago

What GPUs did you do your work on? Might it simply be that there is a minimum GPU memory size of 40GB or something?

jinwyp commented 8 months ago

There is a bug when batch_size: 4. Please pull the latest code

https://github.com/williamyang1991/FRESCO/pull/6

boxabirds commented 8 months ago

I get the same error when batch size is 2 as well though …?

efwfe commented 8 months ago

A10G 24G works fine with batch size = 8

boxabirds commented 8 months ago

Great — based on my Stack trace, what am I doing wrong?

On Thu, 21 Mar 2024 at 13:30, efwfe @.***> wrote:

A10G 24G works fine with batch size = 8

— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2012301298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.***>

efwfe commented 8 months ago

Great — based on my Stack trace, what am I doing wrong? … On Thu, 21 Mar 2024 at 13:30, efwfe @.> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.>

It's not clear what happened here. Try pulling and using the latest code maybe helpful.

JPW0080 commented 8 months ago

Is xformers installed?

boxabirds commented 8 months ago

No it wasn’t listed as part of the requirements.

On Thu, 21 Mar 2024 at 15:17, JPW0080 @.***> wrote:

Is xformers installed?

— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2012593208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62OOSX5NXNCREOK5OGDYZL2YJAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGU4TGMRQHA . You are receiving this because you authored the thread.Message ID: @.***>

boxabirds commented 8 months ago

Great — based on my Stack trace, what am I doing wrong? … On Thu, 21 Mar 2024 at 13:30, efwfe @.**> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.**>

It's not clear what happened here. Try pulling and using the latest code maybe helpful.

I checked and this is against the latest code. I don't see any changes in the last 12 hours and my pull was inside that time.

moosl commented 8 months ago

I have the same issue here.

williamyang1991 commented 8 months ago

Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)? https://github.com/williamyang1991/FRESCO/issues/14#issuecomment-2011320317

williamyang1991 commented 8 months ago

Is xformers installed?

I tried xformers.ops.memory_efficient_attention, but found it is less memory efficient than F.scaled_dot_product_attention So I didn't use xformers in my code

https://github.com/williamyang1991/FRESCO/blob/9fe1be71b6c21890b5bc92659026f9586440266e/src/diffusion_hacked.py#L290-L306

boxabirds commented 8 months ago

Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)? #14 (comment)

There’s something very strange going on because i#14 is a 12 gig GPU and it works but I have a 24 GB GPU and it won’t do even the most basic processing on a image sequence requiring 112MB. Something’s going on with the pytorch allocation, why does it need 20 gigs of GPU RAM? The only thing I can conclude is #14 is against a different version of the code base.

williamyang1991 commented 8 months ago

I think maybe there is no problem with the code. Maybe there is some specific settings on GPU allocation in your computer that causes the OOM?

boxabirds commented 8 months ago

Could be. And

I’ve used this GPU for lots of other tasks and there’s been no issue that I can see.
Also, there’s someone else @.***) commenting they have the same problem. So don’t think it’s specific to my environment.

Very happy to turn on extra logging to help figure this out: what’s the best way to do that?

Is pytorch allocating 21GB GPU RAM expected behaviour?

On Fri, 22 Mar 2024 at 04:39, Shuai Yang @.***> wrote:

I think maybe there is no problem with the code. Maybe there is some specific settings on GPU allocation in your computer that causes the OOM?

— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2014342134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62N57P3SY44INENC7LTYZOYZFAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGM2DEMJTGQ . You are receiving this because you authored the thread.Message ID: @.***>

williamyang1991 commented 8 months ago

you can print memory usage in diffusion_hacked.py like

print('diffusion_hacked Line 286: ', GPU.getGPUs()[1].memoryUsed)

to see when running which code, OOM happens.

cvespaz commented 8 months ago

same issue here, following

cvespaz commented 8 months ago

even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio

williamyang1991 commented 8 months ago

even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio

full frames do not take more memory. Keyframe part uses the most the memory. You mean the example video work fine but your own video oom? Maybe your video has too many pixels. The example video has 512*512 pixels. If your video is large, you can use smaller resize parameter https://github.com/williamyang1991/FRESCO/blob/9fe1be71b6c21890b5bc92659026f9586440266e/run_fresco.py#L170

williamyang1991 / FRESCO

Out of memory: RTX 4090 / 24GB #16