huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.05k stars 5.36k forks source link

Adopting different image aspect ratio in training? #6659

Open linnanwang opened 9 months ago

linnanwang commented 9 months ago

Is your feature request related to a problem? Please describe. Hello there,

I have a question regarding the UNet and Lora Training in image AR other than 1:1. Right now I saw the data is resized to 1:1 using the linear interpolation provided from pytorch. But my question is how to handle 16:9 images? Thanks.

Perhaps resize the image proportionally with length to 1024 and pad the white space in width?

DN6 commented 9 months ago

Using 16:9 aspect ratio with a width of 1024 would mean the image height is 576. This should work fine depending on the model you're training. Padding with whitespace might lead to whitespace in the outputs as well.

linnanwang commented 9 months ago

Thanks @DN6 , what's the right way to handle 16:9 images?

yiyixuxu commented 8 months ago

can we use forums for questions? https://github.com/huggingface/diffusers/discussions

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.