Open boxabirds opened 8 months ago
You can set a smaller batch size to avoid OOM. https://github.com/williamyang1991/FRESCO/blob/9fe1be71b6c21890b5bc92659026f9586440266e/config/config_music.yaml#L17
Our method will optimize the feature during DDPM, which reaches a peak memory usage when optimization is applied.
I tried with batch_size: 4 and then 2 and it made no difference 🤔
I don't think it's this: it's saying it's trying to allocate 112MB, GPU has capacity of 23.65GB but only 106MB is free. 20.85 GiB is allocated by Pytorch. But for what I wonder
What GPUs did you do your work on? Might it simply be that there is a minimum GPU memory size of 40GB or something?
There is a bug when batch_size: 4. Please pull the latest code
I get the same error when batch size is 2 as well though …?
A10G 24G works fine with batch size = 8
Great — based on my Stack trace, what am I doing wrong?
On Thu, 21 Mar 2024 at 13:30, efwfe @.***> wrote:
A10G 24G works fine with batch size = 8
— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2012301298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.***>
Great — based on my Stack trace, what am I doing wrong? … On Thu, 21 Mar 2024 at 13:30, efwfe @.> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.>
It's not clear what happened here. Try pulling and using the latest code maybe helpful.
Is xformers installed?
No it wasn’t listed as part of the requirements.
On Thu, 21 Mar 2024 at 15:17, JPW0080 @.***> wrote:
Is xformers installed?
— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2012593208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62OOSX5NXNCREOK5OGDYZL2YJAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGU4TGMRQHA . You are receiving this because you authored the thread.Message ID: @.***>
Great — based on my Stack trace, what am I doing wrong? … On Thu, 21 Mar 2024 at 13:30, efwfe @.**> wrote: A10G 24G works fine with batch size = 8 — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62JFRZGNTVCP2A5S3WTYZLOHBAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGMYDCMRZHA . You are receiving this because you authored the thread.Message ID: @.**>
It's not clear what happened here. Try pulling and using the latest code maybe helpful.
I checked and this is against the latest code. I don't see any changes in the last 12 hours and my pull was inside that time.
I have the same issue here.
Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)? https://github.com/williamyang1991/FRESCO/issues/14#issuecomment-2011320317
Is xformers installed?
I tried xformers.ops.memory_efficient_attention
, but found it is less memory efficient than F.scaled_dot_product_attention
So I didn't use xformers in my code
Then maybe you could turn off the optmization function to further same memory (but sacrifice performance)? #14 (comment)
There’s something very strange going on because i#14 is a 12 gig GPU and it works but I have a 24 GB GPU and it won’t do even the most basic processing on a image sequence requiring 112MB. Something’s going on with the pytorch allocation, why does it need 20 gigs of GPU RAM? The only thing I can conclude is #14 is against a different version of the code base.
I think maybe there is no problem with the code. Maybe there is some specific settings on GPU allocation in your computer that causes the OOM?
Could be. And
Very happy to turn on extra logging to help figure this out: what’s the best way to do that?
Is pytorch allocating 21GB GPU RAM expected behaviour?
On Fri, 22 Mar 2024 at 04:39, Shuai Yang @.***> wrote:
I think maybe there is no problem with the code. Maybe there is some specific settings on GPU allocation in your computer that causes the OOM?
— Reply to this email directly, view it on GitHub https://github.com/williamyang1991/FRESCO/issues/16#issuecomment-2014342134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABD62N57P3SY44INENC7LTYZOYZFAVCNFSM6AAAAABFBH62QKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGM2DEMJTGQ . You are receiving this because you authored the thread.Message ID: @.***>
you can print memory usage in diffusion_hacked.py
like
print('diffusion_hacked Line 286: ', GPU.getGPUs()[1].memoryUsed)
to see when running which code, OOM happens.
same issue here, following
even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio
even just running "run keyframes" ooms on a 24gb card? am i missing something here? ran the example test just fine with gradio
full frames do not take more memory. Keyframe part uses the most the memory. You mean the example video work fine but your own video oom? Maybe your video has too many pixels. The example video has 512*512 pixels. If your video is large, you can use smaller resize parameter https://github.com/williamyang1991/FRESCO/blob/9fe1be71b6c21890b5bc92659026f9586440266e/run_fresco.py#L170
Hi no matter what movie size I choose -- 5fps, 640x480 I get this error below. nvtop shows that the webUI triggers pre-allocation of 21.5GB but then ... it's not used?
/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0%| | 0/15 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/queueing.py", line 388, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/route_utils.py", line 219, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1437, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/blocks.py", line 1109, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/gradio/utils.py", line 650, in wrapper response = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 159, in process keypath = process1(*args) ^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/webUI.py", line 280, in process1 latents = inference(global_state.pipe, global_state.controlnet, global_state.frescoProc, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/pipe_FRESCO.py", line 201, in inference noise_pred = pipe.unet( ^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 776, in forward sample = optimize_feature(sample, flows, occs, correlation_matrix, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 485, in optimize_feature optimizer.step(closure) File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/optimizer.py", line 76, in _use_grad ret = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/optim/adam.py", line 143, in step loss = closure() ^^^^^^^^^ File "/home/julian/sambashare/expts/FRESCO/src/diffusion_hacked.py", line 478, in closure loss.backward() File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/julian/.local/share/virtualenvs/FRESCO-jUStlEeO/lib/python3.11/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.06 GiB. GPU 0 has a total capacty of 23.65 GiB of which 525.94 MiB is free. Process 15481 has 1.27 GiB memory in use. Including non-PyTorch memory, this process has 21.00 GiB memory in use. Of the allocated memory 20.11 GiB is allocated by PyTorch, and 416.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF