Open DarrenZhaoFR opened 6 months ago
The encoder has been trained to produce latents which do not upset the base model. More details in section 3 of paper: https://arxiv.org/pdf/2402.17113.pdf
Since your model is fine-tuned from the base model, it behaves similarly. If you trained it from scratch, it would not work anymore. Here is a paper that researches this behavior in more detail. https://arxiv.org/pdf/2305.12827.pdf The paper is about LLMs instead of SDXL, but I presume that the same concepts apply here.
Hi, awesome work! As you mentioned the safetensors you released, which are basically weight offsets (in my understanding), can be applied to any SDXL models. I can understand that if the base model is the same(params and architecture), applying offsets can finally get the same model. However I tried
layer_xl_transparent_conv.safetensors
on my own fine-tuned model(unet model params are changed), and it still works pretty good. Is there a theory behind this? Hope maybe you can share some insights. Thanks!