jiuntian / interactdiffusion

[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
https://jiuntian.github.io/interactdiffusion/
100 stars 10 forks source link

SD and matching GLIGEN checkpoints for training #15

Open TimandXiyu opened 2 days ago

TimandXiyu commented 2 days ago

Hi,

Can I confirm that the code relies on SD and a matching GLIGEN weight? It seems the code can run w/o the GLIGEN weight because the readme only says SD weight is a must and GLIGEN doesn't really have official ckpt for SD1.5.

For the v1.1 version, do we start directly from SD1.5 and forget about loading GLIGEN? There are a lot of comments originating from the GLIGEN's code base... so it is a bit confusing what is the actual intention for some part of the code.

Feeling like I am missing something important, can the author explain how is the v1.1 version trained?

jiuntian commented 2 days ago

We start training from pretrained GLIGEN, and the inference code can run without GLIGEN weight because our trained weights overwrite the GLIGEN parameters. For v1.1, we start from SD1.5 and also GLIGEN, particularly, we first loaded SDv1.5 checkpoint and then loaded GLIGEN pretrained parameters for the rest of parameters. See loading SD and loading GLIGEN.

For SD v1.x, GLIGEN and InteractDiffusion are compatible with the v1.x series. Hope this helps.

TimandXiyu commented 15 hours ago

Thanks for responding.

So I am supposed to load the same Box+Text GLIGEN weights for both v1.4 and v1.5? Even though the GLIGEN weight is for v1.4, the idea is that after some iterations, the GLIGEN weight will become compatible with SDv1.5.