Closed feng20001022 closed 19 hours ago
Would you please share more info about your script, hardware, and environment?
We have tested the code under A100 and H100.
scipt: python single_inference.py --user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' --save_path ./output_videos/test_video.mp4 --vae ./Allegro/vae --dit ./Allegro/transformer --text_encoder ./Allegro/text_encoder --tokenizer ./Allegro/tokenizer --guidance_scale 7.5 --num_sampling_steps 100 --seed 42 --enable_cpu_offload
hareware: a T4 GPU
environment:CUDA12.6 python3.10 torch2.4.1
I only change "vae = AllegroAutoencoderKL3D.from_pretrained(args.vae, torch_dtype=torch.float32).cuda()" to "vae = AllegroAutoencoderKL3D.from_pretrained(args.vae, torch_dtype=torch.bfloat16).cuda() " in single_inference.py
We haven't test our model on GPUs other than H/A100. I'm afraid I can't answer this question. Consider the performance of t4, I guess even if you can load the model on T4, the inference speed will be extremely slow.
Even if you set the enable_cpu_offload
is true, you still need to check if the memory is available. When I run the single_inference.py with 24GB and --enable_cpu_offload, free command showed that 18GB is used. And I watched the monitor of devices in Colab, so I'm sure it happens.
I set enable_cpu_offload to true and changed the VAE type to BF16, but why is 15GB of VRAM still not enough for testing