Open leonardodora opened 1 week ago
input_sq_size
is used to align the positional embedding. For training and sampling, just keep it fixed.
The reason we introduced this is that our weights are initialized from Pixart-Sigma, where the base image size is 512x512. Thus, to map all positional embedding to the same size, we set input_sq_size
(means original model's input image resolution's square root) to 512.
This issue is stale because it has been open for 7 days with no activity.
And how should we use it in training and sampling? Thanks!