torch.cuda.OutOfMemoryError happened in the mirror of opensora 1.1 on the cloud platform

reich208github commented 1 week ago

Hi, guys

I have rented two a800 and chosen the mirror of opensora 1.1 on the cloud platform.

But when I try to run the command below:

python scripts/inference-long.py configs/opensora-v1-1/inference/sample.py \ --num-frames 32 --image-size 832 1110 --loop 1 --condition-frame-length 8 --sample-name husky_2 \ --prompt 'a group of siberian husky dogs run out from a door to eat dog food and drink milk.'

I find it reports error like this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 49.11 GiB. GPU 0 has a total capacty of 79.32 GiB of which 30.63 GiB is free. Process 2924960 has 48.68 GiB memory in use. Of the allocated memory 47.91 GiB is allocated by PyTorch, and 263.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It seems like that only one GPU is detected. So how to fix this problem?

Thank you!

Edenzzzz commented 1 week ago

Stated in the README

github-actions[bot] commented 2 days ago

This issue is stale because it has been open for 7 days with no activity.

hpcaitech / Open-Sora

torch.cuda.OutOfMemoryError happened in the mirror of opensora 1.1 on the cloud platform #535