jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
1.98k stars 176 forks source link

out of memory - does not support bf16 - Tesla P40 24gb Vram #128

Open gandolfi974 opened 2 days ago

gandolfi974 commented 2 days ago

hi, i have tried to generate 5s of video with 768p and 12 frames/s with my P40 (24gb Vram) and gradio interface on windows. But i have this message error. i have activate cpu_offloading and i have modified BF16 to f32 to avoid error.

[ERROR] Error during text-to-video generation: CUDA out of memory. Tried to allocate 10.90 GiB. GPU 0 has a total capacty of 23.90 GiB of which 183.25 MiB is free. Of the allocated memory 9.91 GiB is allocated by PyTorch, and 13.60 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

feifeiobama commented 2 days ago

I haven't tested the f32 version on 24GB GPUs, so I'm not sure if this is normal. Perhaps you can enable sequential CPU offloading to further save GPU memory.

gandolfi974 commented 2 days ago

how to enable sequential CPU offloading ? i have already enable cpu_offloading (true by default on gradio).

feifeiobama commented 2 days ago

how to enable sequential CPU offloading ? i have already enable cpu_offloading (true by default on gradio).

Please see https://github.com/jy0205/Pyramid-Flow/pull/75.

gandolfi974 commented 2 days ago

thanks. i have tried. It's very slow and i have same error "out of memory" near the end of video generation.

feifeiobama commented 2 days ago

thanks. i have tried. It's very slow and i have same error "out of memory" near the end of video generation.

Sorry to hear that. If the OOM occurs near the end of generation, it may be due to the VAE decoding setting instead of CPU offloading. Please disable sequential CPU offloading and further reduce tile_sample_min_size. See https://github.com/jy0205/Pyramid-Flow/issues/5#issuecomment-2404411873 for details.

gandolfi974 commented 2 days ago

it's working. 17 min for 10 sec of video. 384p and 12 frames/sec.

feifeiobama commented 2 days ago

it's working. 17 min for 10 sec of video. 384p and 12 frames/sec.

Great to hear that. By the way, our model is trained at 24 fps, so it is better to export the video at 24 fps.