rhymes-ai / Allegro

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
https://rhymes.ai/
Apache License 2.0
291 stars 14 forks source link

why is 15GB of VRAM still not enough for testing #12

Closed feng20001022 closed 19 hours ago

feng20001022 commented 1 day ago

I set enable_cpu_offload to true and changed the VAE type to BF16, but why is 15GB of VRAM still not enough for testing

hyang0511 commented 1 day ago

Would you please share more info about your script, hardware, and environment?

We have tested the code under A100 and H100.

feng20001022 commented 1 day ago

scipt: python single_inference.py --user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' --save_path ./output_videos/test_video.mp4 --vae ./Allegro/vae --dit ./Allegro/transformer --text_encoder ./Allegro/text_encoder --tokenizer ./Allegro/tokenizer --guidance_scale 7.5 --num_sampling_steps 100 --seed 42 --enable_cpu_offload

hareware: a T4 GPU

environment:CUDA12.6 python3.10 torch2.4.1

I only change "vae = AllegroAutoencoderKL3D.from_pretrained(args.vae, torch_dtype=torch.float32).cuda()" to "vae = AllegroAutoencoderKL3D.from_pretrained(args.vae, torch_dtype=torch.bfloat16).cuda() " in single_inference.py

nightsnack commented 1 day ago

We haven't test our model on GPUs other than H/A100. I'm afraid I can't answer this question. Consider the performance of t4, I guess even if you can load the model on T4, the inference speed will be extremely slow.

DsnTgr commented 19 hours ago

Even if you set the enable_cpu_offload is true, you still need to check if the memory is available. When I run the single_inference.py with 24GB and --enable_cpu_offload, free command showed that 18GB is used. And I watched the monitor of devices in Colab, so I'm sure it happens.