seruva19 / kubin

Web-GUI for Kandinsky text-to-image diffusion models.
175 stars 18 forks source link

hi, can I use flan_ul2_encoder from model 3.0 with 3.1 #179

Open nikolaiusa opened 4 months ago

nikolaiusa commented 4 months ago

or are they different?

seruva19 commented 4 months ago

Yes, I think they are fully compatible. They are, in fact, the same model, except that for "Kandinsky Flash" pipeline an additional projection layer to the text encoder was added. They used a distillation methodology similar to LCM and SDXL Turbo and applied the approach described in "Adversarial Diffusion Distillation" by training a GAN to accelerate generation. This additional layer (as far as I understand, containing cross-attention layers for FLAN-UL2 embeddings) is required for the Flash pipeline to function properly. Therefore, even if any text encoder other than the default is chosen in Settings, the projection layer will still be used from the https://huggingface.co/ai-forever/Kandinsky3.1 text encoder (because, obviously, any other text encoder repos do not contain this layer).

nikolaiusa commented 4 months ago

thanks for the answer. can I ask another question? how to choose one of several GPU adapters? general: device: cuda1?

seruva19 commented 4 months ago

Right, but it must be "cuda:1" (with colon).

nikolaiusa commented 4 months ago
2024-5-25 19-9-16

here's an idea for constant generation, each time with a new seed)

seruva19 commented 4 months ago

here's an idea for constant generation, each time with a new seed)

OK, I'll think about it. I never used this function in Auto1111, but it's easy to implement.