what the param <input_sq_size> stands for?

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

https://hpcaitech.github.io/Open-Sora/

Apache License 2.0

20.1k stars 1.91k forks source link

what the param <input_sq_size> stands for? #519

Open leonardodora opened 1 week ago

leonardodora commented 1 week ago

And how should we use it in training and sampling? Thanks!

zhengzangw commented 1 week ago

input_sq_size is used to align the positional embedding. For training and sampling, just keep it fixed.

The reason we introduced this is that our weights are initialized from Pixart-Sigma, where the base image size is 512x512. Thus, to map all positional embedding to the same size, we set input_sq_size (means original model's input image resolution's square root) to 512.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open for 7 days with no activity.