lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
29.76k stars 2.69k forks source link

Colored Scribble input #58

Open lazy-nurd opened 1 year ago

lazy-nurd commented 1 year ago

Hey, Thanks for your nice work. Scrilbble works really well for generation. But there are a lot of scenarios where we want the input image to contain colors and strokes and we want sd to process it. For example this image. image It has scribble and the color as well, and the model should take input the color and the the sketch as well, but current implementation does not consider the color and only consider sketches and hence input color is not being considered. image Example is above.

If you can train a model that takes the colored scribbled and produces results according to it would be perfect.

Thanks

DiceOwl commented 1 year ago

As a tip, you can use img2img with controlnet is more or less what you want. get the scribbles from the boundaries, and the colors from the img2img input. at denoising strength 0.8 or so color hints survive, but structure is lost in regular img2img, but the controlnet preserves the structure.

CesarERamosMedina commented 1 year ago

@DiceOwl how would you go about using img2img with control net? At the moment, the base model that they use is text2img

DiceOwl commented 1 year ago

https://github.com/Mikubill/sd-webui-controlnet, an extension for Automatic1111, directly allows for img2img. More generally, stable diffusion is intrinsically img2img. text2img just replaces the input image by pure noise. So if you are the hacking kind, you could probably hack rudimentary img2img support into this repo with just a handful of lines of code. You just have to replace the pure noise with the right kind of mixture between noise and (diffused) input image, and adjust the denoising parameters to not start from pure noise.

leoShen917 commented 1 year ago

I'm also interested in this question, have you solved it please and look forward to your reply! @lazy-nurd

leoShen917 commented 1 year ago

https://github.com/Mikubill/sd-webui-controlnet, an extension for Automatic1111, directly allows for img2img. More generally, stable diffusion is intrinsically img2img. text2img just replaces the input image by pure noise. So if you are the hacking kind, you could probably hack rudimentary img2img support into this repo with just a handful of lines of code. You just have to replace the pure noise with the right kind of mixture between noise and (diffused) input image, and adjust the denoising parameters to not start from pure noise.

When I try to do this during denoising, the resulting image tends to be more blurred, do you know why and how to solve it?