jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
2.06k stars 186 forks source link

Running out of 12GB ? #78

Open juntaosun opened 2 weeks ago

juntaosun commented 2 weeks ago

The configuration required is too high.

{EF2E2161-0C04-475E-9CDC-07A2678612EA}

{BBA356E0-C1AA-4331-9F34-939E2174F9E1}

Because the GPU memory is exhausted, the speed is very slow!

The current video generation scheme has too many shortcomings. Almost all frames are generated at once, causing the GPU memory to be exhausted.

If in the future, a frame can be generated, then this frame is saved as an image png sequence, and then all the frame image sequences are merged. This should save a lot of GPU memory.

feifeiobama commented 2 weeks ago

We have addressed some issues https://github.com/jy0205/Pyramid-Flow/pull/76, please update the codebase and see how it goes. We also have a big upcoming improvement https://github.com/jy0205/Pyramid-Flow/pull/75 that should allow use with less than 8GB of GPU memory. Stay tuned for further efficiency improvements.

juntaosun commented 2 weeks ago

I tried many image-to-video conversions, but the results were not good. The picture became uncontrollable after 1 second.

Looking at the actual examples, the effect needs to be improved.

https://github.com/user-attachments/assets/b4614bea-a13d-485d-a95f-4941741b6173

juntaosun commented 2 weeks ago

I suggest to remove the "Image-to-Video" function. The generated faces and hands are the same as in the "Thriller" movie.

feifeiobama commented 2 weeks ago

I suggest to remove the "Image-to-Video" function. The generated faces and hands are the same as in the "Thriller" movie.

Thank you for your valuable feedback. We acknowledge these issues (human deformation due to SD3 weight initialization; quality degradation due to insufficient autoregressive training) and are working on a new model checkpoint.