vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.31k stars 379 forks source link

[Feature]: xl-to-v1 interposer #2591

Open YacratesWyh opened 7 months ago

YacratesWyh commented 7 months ago

Feature description

Previously I was using Fooocus and use xl-15 as common, but found it producing badly in webui. After a while, I found a xl-to-v1 interposer in fooocus which allows xl to produce semantically effective image and make it looks 1.5 style like, which is pretty useful. I wonder if sdnext could have such a feature or if I missed it somewhere?

Version Platform Description

To lead you there is such a model in fooocus, it's here: /Fooocus/models/vae_approx/xl-to-v1_interposer-v3.1.safetensors You can find it here anyway: https://huggingface.co/lllyasviel/misc/tree/main

vladmandic commented 7 months ago

you need to provide a bit more info than just link to a file. and saying "use xl-15 as common" - what does that mean? i honestly don't know what you're referring to.

what is the use case and workflow where you find this useful? as far as i know, this is a model to approximate sd v1.5 latents when using xl model - but why would you do that other than experiment?

YacratesWyh commented 7 months ago

do you know x-adapter? simply that kind of use. Upgrade v1 loras and use them in xl models. Let's say, I can find a lot of genshin item lora in v1, yet I don't have them in xl. Without access to the training data (to have a quick view of whether it works), I could try it in fooocus with a xl in the first steps to get a better semantic shape(with less flaw) and refine them in 1.5 and 1.5 LoRA to make it the determined style. The need is basically a quick try of certain 1.5 model and get a better semantic shape, taking advantages of xl models having more details and refine them with 1.5 model. Even though xl-refine models have a better texture, the key is to get a better middle part denoise process. Here is a direct XL image image Make it through a v1 model from 0.667 to give it a specific style Through this process we would have a better limbs, legs, due to the start denoising with a realistic model.

Probably you may say it's better to get the image directly from a v1 model+v1LoRA. It's also a method in upscaling to use a latent image generated from ksampler to use it directly into another Ksampler as an input, which is pretty helpful to control. https://comfyanonymous.github.io/ComfyUI_examples/2_pass_txt2img/

YacratesWyh commented 7 months ago

image sry my latent space didn't save the 2pass, but they are somehow the same, obviously. These two images use two different random seeds but I think you would get the same middle step with exactly the same step and speed.

YacratesWyh commented 7 months ago

Also, v1 is much faster. So far turbo is not well developed, so in practice for customers and artists colleagues, we need a fast way to show them if it's helpful, to decide whether to develop further.

vladmandic commented 7 months ago

adding xl-to-v1-interposer model support would be easy, but how would user actually use it? it would need a new ui workflow designed and your use-case is very complicated - it would get used by very few users. also, it requires that two models are used to generate single image (sd15 and sdxl) which means loading overhead which is far bigger than performance gained in the process.

i'd rather spend limited time improving sdxl up to a point where its fast enough that anyone can use it without requiring such workarounds.

but of course, PRs are welcome.