Open songtianhui opened 1 year ago
ControlNet-tile is a ControlNet just like any other that takes an input image (and also text) to control image generation. With ControlNet-tile, the control or hint image is the input image itself (or patches of it).
You can use it to do ControlNet text-to-image generation with it. However, with this method, the generated image may look "different" or change in content. This makes it not suitable for super-resolution use case. However, like the base SD, you can use ControlNet in image-to-image mode. This in a way contrains the generated images to look more like the original image, or at least have the same colors in the same spatial locations. Note, the only difference between text-to-image and image-to-image mode is in the starting point. With text-to-image, the starting point is random noise. With text-to-image, the starting point is a noisy version of the input image.
Usually a model can only handle a certain resolution without hitting resource limits. With tiling, it just splits the prediction process to process tiles of an image instead of an image in a single go.
So the good practice for super-resolution is to use the same image in image-to-image tab and controlnet hint image. I think the controlnet-tile is a bit like sd img2img, both are in order to maintain the original image's structure and color. But controlnet-tile will add more generated details, and it change the pipeline with splitting the prediction process to support zooming resolution. Is this idea correct?
I did not find the tile code ,could you show me?
Hi, I am confused about the tile function. First, I want to know what does it do? The readme says it can replace the details in image, so does it designed for task like resolution? Or is it just like an img2img function to keep the original low-freq structure and add details? Can it be used in img2img, and is it needed to keep the main image and the control image the same one? If the two images are different how to explain the results? Second, I want to know how this technical implemented. I will read the code afterwards, but what's the core idea? Does it have relationship with another extension Tiled Diffusion & VAE. Say, it just patchify the image firstly and diffusion process each, then fuse them? Thank you!