miccunifi / ladi-vton

[ACM MM 2023] - LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
Other
412 stars 56 forks source link

Problem about training #22

Open Kangkang625 opened 1 year ago

Kangkang625 commented 1 year ago

Hi,thank you for your great work!

I was trying to wrtie train code and do some training, but I was confused by the We first train the EMASC modules, the textual-inversion adapter,and the warping component. Then, we freeze all the weights of allmodules except for the textual inversion adapter and train the proposed enhanced Stable Diffusion pipeline in 4.2, should I first freeze other weights including unet and train textual inversion adapter or should I free other weight and train textual inversion adapter and unet together。

snaiws commented 1 year ago

I wonder it too.

ABaldrati commented 1 year ago

Hi @Kangkang625 Thanks for your interest in our work!!

should I first freeze other weights including unet and train textual inversion adapter or should I free other weight and train textual inversion adapter and unet together

First, you should pre-train the inversion adapter, keeping all the other weights (including the unet) frozen. Then keeping frozen the EMASC and the warping module, you should train the unet and the (pre-trained) inversion adapter together.

I hope this clarify your doubts Alberto

Kangkang625 commented 1 year ago

Thanks for your answer @ABaldrati ! it's very helpful to my further study,but I still have a little confusion about the unet training.

According to my understanding, the unet should be extended based on the unet of stable diffusion pipeline. Should I extend the unet, initialize the changed part weight randomly and directly freeze it to pre-train the textual inversion adapter ?

Thanks again for your great work and detailed answer!

ABaldrati commented 1 year ago

According to my understanding, the unet should be extended based on the unet of stable diffusion pipeline. Should I extend the unet, initialize the changed part weight randomly and directly freeze it to pre-train the textual inversion adapter ?

When we pre-train the inversion adapter we use the standard Stable Diffusion inpainting model. In this phase we do not extend the unet