hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
20.1k stars 1.9k forks source link

torch.cuda.OutOfMemoryError happened in the mirror of opensora 1.1 on the cloud platform #535

Open reich208github opened 1 week ago

reich208github commented 1 week ago

Hi, guys

I have rented two a800 and chosen the mirror of opensora 1.1 on the cloud platform.

But when I try to run the command below:

python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py \ --num-frames 32 --image-size 832 1110 --loop 1 --condition-frame-length 8 --sample-name husky_2 \ --prompt 'a group of siberian husky dogs run out from a door to eat dog food and drink milk.'

I find it reports error like this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 49.11 GiB. GPU 0 has a total capacty of 79.32 GiB of which 30.63 GiB is free. Process 2924960 has 48.68 GiB memory in use. Of the allocated memory 47.91 GiB is allocated by PyTorch, and 263.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It seems like that only one GPU is detected. So how to fix this problem?

Thank you!

Edenzzzz commented 1 week ago

Stated in the README

image
github-actions[bot] commented 2 days ago

This issue is stale because it has been open for 7 days with no activity.