Closed TimandXiyu closed 3 days ago
We start training from pretrained GLIGEN, and the inference code can run without GLIGEN weight because our trained weights overwrite the GLIGEN parameters. For v1.1, we start from SD1.5 and also GLIGEN, particularly, we first loaded SDv1.5 checkpoint and then loaded GLIGEN pretrained parameters for the rest of parameters. See loading SD and loading GLIGEN.
For SD v1.x, GLIGEN and InteractDiffusion are compatible with the v1.x series. Hope this helps.
Thanks for responding.
So I am supposed to load the same Box+Text GLIGEN weights for both v1.4 and v1.5? Even though the GLIGEN weight is for v1.4, the idea is that after some iterations, the GLIGEN weight will become compatible with SDv1.5.
We start training from pretrained GLIGEN, and the inference code can run without GLIGEN weight because our trained weights overwrite the GLIGEN parameters. For v1.1, we start from SD1.5 and also GLIGEN, particularly, we first loaded SDv1.5 checkpoint and then loaded GLIGEN pretrained parameters for the rest of the parameters. See loading SD and loading GLIGEN.
For SD v1.x, GLIGEN and InteractDiffusion are compatible with the v1.x series. Hope this helps.
From what I have seen, the code won't work with SD 1.5 because it is no longer using the .ckpt structure. Am I supposed to do a manual conversion for the HF style SD 1.5 weights? I attempted this but the layer nameing of SD 1.5 is very different from your style.
Or maybe you are using the pruned version of SD 1.5 which does have .ckpt still existing somewhere other than the official repo (the official repo is hidden by runway).
I think should be this file, the original repo was removed.
Hi, this version of .ckpt for sd1.5 seems to fit, thanks!
Hi,
Can I confirm that the code relies on SD and a matching GLIGEN weight? It seems the code can run w/o the GLIGEN weight because the readme only says SD weight is a must and GLIGEN doesn't really have official ckpt for SD1.5.
For the v1.1 version, do we start directly from SD1.5 and forget about loading GLIGEN? There are a lot of comments originating from the GLIGEN's code base... so it is a bit confusing what is the actual intention for some part of the code.
Feeling like I am missing something important, can the author explain how is the v1.1 version trained?