[Feature Request]: Tile Diffusion for ComfyUI

paulo-coronado commented 1 year ago

Dear authors,

First, thanks and congratulations for such a great implementation! As ComfyUI is growing fast, I wish there was a Tile Diffusion implementation for ComfyUI.

I am also a developer and I am trying to develop a custom node for Tile Diffusion. However, the current repo code seems very tied to A1111/Gradio UI. Do you have any plans for creating an implementation for ComfyUI? Also, do you have any other resource I could check and use as a base code?

Thanks!

Kahsolt commented 1 year ago

Hello friend~ ComfyUI has really nice idea in its workflow design, I believe it will become a professional AIGC design aid tool in the near future. Not surprised, there already have related discussion on thread #131 and #39. 😄

However, at the time we started to write this extension, ComfyUI had not born yet, or at least not that well-known. So everything we've done is fitted best to A1111's sd-webui framework, even dozens of trick is messed in to avoid the shitty gradio bugs, we did not consider more abstract layers as long as it makes to, we're sad to admit that it is hard to migrate :(

For code base of this repo, only the noise-inversion part are mostly directly borrowed from webui's native script img2img_alt.py, all other code and hacks are all originally written, extremely optimized for VRAM performance, hence also, it kills to decouple. 🤣

We suggest that you create and share your repo link first, in DEV.md briefly describe how the ComfyUI node works, what data it requires. Then we could tell which part of the code in this repo might be helpful and reusable for you. Looking forward to the further discussions~

Kahsolt commented 1 year ago

After I read the code of ComfyUI, I believe nearly everything should be re-written respecting to that framework. Call their inner APIs and do hijack or monkey patching specifically. Not that much stuff could migrate :(

paulo-coronado commented 1 year ago

Hello, @Kahsolt! Thanks for the quick reply!

I just published my project (Tiled Diffusion for ComfyUI), but as MultiDiffusion has not been implemented yet, this is what ControlNet tile + simple tile concatenation results:

tiling

It is clear that ControlNet, by it self, does not correct the seams, so I think that the best strategy is to reuse MultiDiffusion's code. About the ComfyUI framework, I don't think there's a need for hijacking, because ComfyUI doesn't have inner APIs, you can just run your functions freely.

The situation is: I already have the upscaled image and I need some guidance on the following steps:

Preprocess the image with MultiDiffusion (tiling);
Sampling the tiles;
Merging the tiles to create a single image.

It would be amazing if there was a function that expects an image, model, sample methods and other parameters and outputs the final image (after tiling, sampling etc.)... 🤩

Anyway, I'd really appreciate if you could provide me some guidance or even consider contributing to the repo!

PS.: I've created a DEVELOPER.md, which gives a brief intro on the ComfyUI framework.

Thanks so much!

Kahsolt commented 1 year ago

Nice! I'll check it today and see what we could help~

Kahsolt commented 1 year ago

Thanks for the guidance, here's what I could figure out so far:

the KSamplerTiled node should be derived from nodes.KSampler or nodes.KSamplerAdvanced, at least assuring API compatible.
when split a latent into tiles, make them overlapping (default overlap=64 in this repo , but leave it tunable for user)
both Multidiffusion and Mixture of Diffusers requires noise or prediction fusion at sampling-step level (this is the core technic to kick out seams 😃), so you have to further hack in (or say borrow code from) comfy.samplers.KSampler, take whole control over line 611~670.
For each category of the samplers, hack into its sample() function (i.e. uni_pc.sample_unipc, DDIMSampler.sample_custom, and k_diffusion_sampling.sample_*)
For each sampling step, refer to code in this repo MultiDiffusion.sample_one_step and MixtureOfDiffusers.apply_model_hijack respectively to quickly get the idea of the two algorithm (it takes much more time to fully understand though...).
- You can safely ignore custom_* stuffs which are related to region prompt control.
- The main difference between Multi and MOD is that Multi fuses the predict (half-)denoised images, while MOD fuses the predicted noise (and using a spatial gaussian mask for more seamlessness).

Now you may understand why this repo is full of messy magic... :) I will keep exploring on comfyui's sampling pipeline and maybe try to sketch up a unified callback architecture... 🤔

PS: What I mean inner APIs are all those classes, objects, functions, methods and even constants that comfy itself provides and are directly reusable by developers :)

Kahsolt commented 1 year ago

Hello again~ I've checked your code, the abstracted idea seems to be like:

tiles = get_tiles(latent_image, tile_size)

output_tiles = []
for tile in tiles:
  tile_image = vae_decode(tile)
  upscaled_image = upscale_image(tile_image, ...)
  positive_control = apply_controlnet_tile(tile_image)
  samples = common_ksampler(model, upscaled_image, positive_control, ...)
  output_tiles.append(samples)

latent = merge_images(output_tiles)
images = vae_decode(latent)

This idea is simple and easy to implement, however, it must cause seam borders and tile unsync in hue, as it breaks any linkage across neighbor tiles :( Still long way to go.

So I wonder if you could specify your final goal for implementing any tiling mechanism in ComfyUI.

Migrate the functionality to ComfyUI (You know, tiling methods are aimed to trade time for lower VRAM usage, so that making it possible to process 4k images on low-end devices)
Theoretical verification experiments or just want to see what basic tiling will make things to be?

I must notify you that this migration is real a hard work, and please consider our current pipeline in the last comment 🐺

paulo-coronado commented 1 year ago

Hi, Kahsolt! Thanks for the guide!

Yes, I know that the current implementation is too simple and there is a long way to go. I found the Tiled Diffusion extension just amazing and my goal is to migrate it to ComfyUI! 🚀

Personally, I think it will be absolutely a game changer for community, as there is no ComfyUI tiling method working yet.

Right now, I am applying your suggestions and hacking the sample method with the MultiDiffusion.sample_one_step code! 😉

WSJUSA commented 1 year ago

Hi, i have a slightly delusional idea that I can help pursue the goal of this feature request. My longer goal would be getting StableSR into Comfyui.

But lets take things one step at a time.

First step I am taking here is to identify if there is already some identical functionality already implemented for Comfyui, and what gaps may exist. Then take the step of identifying what would be easiest path to bridge any gaps.

Here are Comfyui modules that seem close in functionality. At code level I will struggle to understand differences until I can get some bearing on similarities and the overall process of running diffusion and where vae is applied. Lets see.

--

Comfyui Tiled Diffusion Module: https://github.com/BlenderNeko/ComfyUI_TiledKSampler/blob/master/tiling.py a popular tiler for diffusion - possible gap is I dont see any indication of having Mixed or Multi Diffusion options.

this module is used by other modules to provide tiled upscaling, such as in

--

Comfyui Tiled Upscaler in Impact Pack https://github.com/ltdrdata/ComfyUI-Impact-Pack/blob/Main/modules/impact/core.py This module builds on the TiledKSampler, it doesn't appear to offer a way to mix diffusion, but maybe worth looking at how it adds more sophisticated options for an upscaling process to TiledKSampler

--

Comfyui Tiled VAE Encoder Decoders https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sd.py Comfyui core includes a VEA class that has tiled encoding and decoding, would be a good place to see how closely it mirrors the TiledVEA

--

What next - get into the details of how this library compares to the above.

Kahsolt commented 12 months ago

@WSJUSA Thx for your information, and I would suggest everyone wanna tilid-diffusion on ComfyUI having a try of https://github.com/BlenderNeko/ComfyUI_TiledKSampler :)

Kahsolt commented 12 months ago

I just had a look at ComfyUI_TiledKSampler's implementation, it is a variation of the Multi-Diffusion algorithm, despite that it uses another way to alleviate the seams. Good for it :) And the official tiled-vae in ComfyUI is rather simple, I can not guarantee the output quality (from the perspective of pure static code logic view)

JPGranizo commented 10 months ago

Hi! I just wanted to check whether anybody has made any progress on Tile Diffusion for ComfyUI

shiimizu commented 9 months ago

https://github.com/shiimizu/ComfyUI-TiledDiffusion

Vigilence commented 5 months ago

Would also like to see official support. ATM the comfyui node doesn't support more than one controlnet node.

pkuliyi2015 / multidiffusion-upscaler-for-automatic1111

[Feature Request]: Tile Diffusion for ComfyUI #177