What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model?

jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

https://pyramid-flow.github.io/

MIT License

2.14k stars 199 forks source link

What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model? #141

Open SAT431 opened 3 days ago

feifeiobama commented 3 days ago

It is a text-to-image model. Please check image_generation_demo.ipynb for its usage.

hashnimo commented 2 days ago

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here: image.resize((width, height))

feifeiobama commented 2 days ago

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here: image.resize((width, height))

Great observation, that would be relying entirely on RoPE's extrapolation capabilities. We also found it on images, but haven't tested it for video generation.