jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
2.4k stars 233 forks source link

Out of memory error running 2GPU RTX 3060 and RTX 4070 #105

Closed drwootton closed 2 weeks ago

drwootton commented 1 month ago

I have a RTX 3060 and RTX 4070 in my system, both 12GB. Since the X server runs on my RTX 4070 I have only about 11GB VRAM there so with X server running, I can run the single GPU script on the project page successfully only on the RTX 3060. If I switch to run mode 3 (no X sever) then I can run that script on either GPU. I updated my git repo to current code as of today, Oct 15 I tried text to video with the scripts/inference_multigpu.sh script. I changed the inference_multigpu.py script to set cpu_offloading=True in both places and that did not help. I tried adding model.enable_sequential_cpu_offload() and that did not help. I tried adding model.enable_sequential_cpu_offload() just before the model.to.vae(device) statement and that did not help. I get the out of memory error for both the 384P and 768P models. Is the intent of the multi-GPU support to cut memory usage in each GPU by about half and split a single frame's generation across both GPUs or is it to speed up generation by generating single frame each on separate GPUs to shorten run time?

feifeiobama commented 1 month ago

The intents of multi-gpu infeence are both to reduce memory usage and to speed up. But (1) we haven't tested it with CPU offloading, (2) it's based on Sequence Parallelism, so it requires loading the model on each GPU which has a certain lower bound rather than directly halving the GPU usage.

For now, we suggest trying out CPU offloading with the single-GPU inference script, which should be able to run within 12GB memory.