imlixinyang / HiSD

Code for "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR 2021 Oral).
Other
392 stars 47 forks source link

Release checkpoint for Celeb-HQ dataset #12

Closed huang-1030 closed 3 years ago

huang-1030 commented 3 years ago

Thank you for your open source. This code is very helpful to my knowledge of face attribute transfer. You have opened the checkpoint of 256 size, could you open the checkpoint of 1024 size for learning and use. Thank you very much.

imlixinyang commented 3 years ago

To increase the resolution to be x2, you need 4x GPU memory if you keep the batch size and all model architecture. From now, it is still very difficult to train an image-to-image translation in such a high resolution (i.e., 1024, as you expect).

20GB for 256 means that 320 GB (>8x Tesla v100) for 1024! Too expensive! Even more, you may need to add UNet-like Skip connection for high resolution to keep the identity while translating, the cost increases again.

HiSD aims to explore the controllable diversity for broad scalable image-to-image translation area, therefore to effectively train a high-resolution model is not the main proposal but the future work.

BTW, there are some papers where you may find the solutions. For example, you may utilize recent StyleGAN-based methods such as InterFaceGAN and some fast image-to-image methods such as https://arxiv.org/pdf/2012.02992.pdf. You need to combine them if you want to maintain the controllability with a high resolution.

huang-1030 commented 3 years ago

Thank you for your reply. Our team is amazed by the effect of HISD in the image-to-image translation area. We will continue to explore the size of 1024 according to your suggestions

At 2021-05-13 16:25:23, "imlixinyang" @.***> wrote:

To increase the resolution to be x2, you need 4x GPU memory if you keep the batch size and all model architecture. From now, it is still very difficult to train an image-to-image translation in such a high resolution (i.e., 1024, as you expect).

20GB for 256 means that 320 GB (>8x Tesla v100) for 1024! Too expensive! Even more, you may need to add UNet-like Skip connection for high resolution to keep the identity while translating, the cost increases again.

HiSD aims to explore the controllable diversity for broad scalable image-to-image translation area, therefore to effectively train a high-resolution model is not the main proposal but the future work.

BTW, there are some papers where you may find the solutions. For example, you may utilize recent StyleGAN-based methods such as InterFaceGAN and some fast image-to-image methods such as https://arxiv.org/pdf/2012.02992.pdf.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

imlixinyang commented 3 years ago

Looking forward to it!