w-e-w / sd-webui-hires-fix-tweaks

Add additional options and features to hires fix of Stable Diffusion web UI
GNU Affero General Public License v3.0
29 stars 0 forks source link

Add Latent-Interposer for SD15 and SDXL upscale models mixing #3

Open DavideAlidosi opened 5 months ago

DavideAlidosi commented 5 months ago

First, congratulations on the work you have done on WebUI. I would like to ask if it would be possible to integrate in the upscale options the Latent-Interposer system, already implemented on ComfyUI, which allows to combine the latent spaces of SD15 and SDXL. The idea would be to take advantage of the increased capabilities of SDXL and then perform a refiner upscale with an SD15 model. In the past few days I have already contacted city96, the author of the original node for Comfyui (https://github.com/city96/SD-Latent-Interposer/issues/5), but I realized that I do not know enough about A1111 and Python to do the work myself. Since you have already taken care of various aspects pertaining to this kind of functionality for A1111, I hope you will find it as interesting as I do. Thank you.

light-and-ray commented 5 months ago

I think In the web-ui it can be used only in refiner. Can you imagine any other use cases? And is it better then just decode with sdxl vae, and re-encode it with 1.5 vae? Examples don't look like there is very good quality

DavideAlidosi commented 5 months ago

Personally, and I think I am not the only one, I am really disappointed with the visual performance of SDXL, especially when using the Hires Fix, which consistently returns an extremely unpleasant noise floor. Of course, there remain the advantages associated with the increased amount of terms and the 1024px resolution, which I feel is a shame not to take advantage of. On the SD15 side, the visual performance remains excellent even with Hires Fix. 4x, which however tends to create inconsistent images compared to the initial generation, perhaps at 512. I honestly could not point to the best way to get the best of both worlds, from my knowledge the latent space conversion seems to be a good method, but I do not exclude that working on the VAE encoding may not bring improvements. Clearly mine are only theoretical assumptions, in any case thank you for your attention.

w-e-w commented 5 months ago

I have a look at webui code and I have concluded that implemented this as an extension will requires replacing the large portions of the hires fixed pipeline with bunch of different strange patches doing so will make the implementation likely to break as webui updates, not to mention volatile when interacting with other extensions so it's better off implementing directly in web UI not as an extension

DavideAlidosi commented 5 months ago

Thank you very much for checking this, I will try to propose the change directly on the WebUI.

w-e-w commented 5 months ago

note when I said

requires replacing the large portions of the hires fixed pipeline

I'm also factoring makeing it work with "decoding and encoding" converting method not just Latent-Interposer to allow support for all upscaller types if you only wish to support Latent-Interposer then seems "more" achievable if you are willing to resort to ugly patches because it requires a smaller section of code to be patched and so it's "more" achievable as an extension

I think if this should be supported there should be no restriction of only using Latent-Interposer and the part about ugly patches still stands still holds, it's better to implement it directly in web UI

w-e-w commented 5 months ago

wait maybe it's actually more achievable than I think if you support if it is just for Latent-Interposer I think we only really need to patch that https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/cb5b335acddd126d4f6c990982816c06beb0d6ae/modules/processing.py#L1317 and detect 1.5->xl or xl or 1.5


I might have a go (when I have time and no promises) are you still kind of don't want to do it because if I want to support this I want to support everything and not just Latent-Interposer