hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
22.19k stars 2.17k forks source link

CUDA out of memory #697

Closed SaadTariq1592 closed 1 month ago

SaadTariq1592 commented 2 months ago

I am running to following code on Google Colab with T4 GPU

!python scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 2s --resolution 480p --aspect-ratio 9:16 \ --prompt "A british shorthair jumping over a couch"

Facing below Error torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 73.06 MiB is free. Process 82243 has 14.67 GiB memory in use. Of the allocated memory 14.55 GiB is allocated by PyTorch, and 23.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables

AvisP commented 2 months ago

I am running out of memory even with H100 (80 GB) and T4 has only 16 GB. What is the gpu memory requirement to run this? Can it run on two H100 each with 80GB? I tried to set to two and device to auto but it gives RunTimeError and mentions that it expects cuda

Screenshot 2024-09-12 at 3 11 31 PM
henbucuoshanghai commented 2 months ago

H100 (80 GB) still OOM?

AvisP commented 1 month ago

Seems like it, did you manage to get it to run on 80GB?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

VirajDeshwal commented 1 month ago

I am getting the same error.

AWS instance: g5.12xlarge ( 48vCPU | 192GB Memory).

Error - CUDA out of memory. Tried to allocate 210.00 MiB. GPU 0 has a total capacity of 21.98 GiB of which 74.44 MiB is free. Including non-PyTorch memory, this process has 21.89 GiB memory in use. Of the allocated memory 21.20 GiB is allocated by PyTorch, and 392.68 MiB is reserved by PyTorch but unallocated.