Closed takuma104 closed 1 year ago
On my side, for the v1.1 hough/mlsd model, everything works successfully, howewer for the normalbae model, I'm unable to unpickle the weights.
python ../scripts/convert_original_controlnet_to_diffusers.py --checkpoint_path control_v11p_sd15_normalbae.pth --original_config_file control_v11p_sd15_normalbae.yaml --dump_path control_v11p_sd15_normalbae --device cpu
:
Traceback (most recent call last):
File "/tmp/controlnet-v11/diffusers/convert/../scripts/convert_original_controlnet_to_diffusers.py", line 80, in <module>
controlnet = download_controlnet_from_original_ckpt(
File "/tmp/controlnet-v11/diffusers/src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py", line 1346, in download_controlnet_from_original_ckpt
checkpoint = torch.load(checkpoint_path, map_location=device)
File "/tmp/controlnet-v11/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/tmp/controlnet-v11/lib/python3.10/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
Hi @Donokami , Hmm, it seems like further investigation is needed. On my end, it appears that all the conversions were successful. For now, I have released the converted weights and added the description to the initial post. By specifying control_v11p_sd15_normalbae
for the subfolder
, we should be able to use normalbae
.
@patrickvonplaten cc ^
@takuma104 thanks for your hard work as always! We're internally also working on it. Should we ready soon :)
Will keep this issue open until that's done.
There seems to be an issue with using 5 controlnet with gradio in MulticontrolNet with diffusers. I tried running it without gradio and am able to fit even 6 controlnets to generate output images. However, when I run it with gradio, After completing the steps, the code fails. The code works with upto 4 controls, but fails when it increases to 5.
By failure, I mean, the SSH connection to my server gets disconnected. Felt very weird.
Could you open a separate issue for this? If the code runs in a non-gradio environment, then I suggest opening the issue in the Gradio repository.
I have created a comparison with the reference implementation. I have created all the conditional images using the new gradio_annotator.py
.
https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare_v11/README.md
From version 1.1, ControlNet no longer includes the base model (such as SD1.5), so the results now match almost pixel-perfect. The slight differences in brightness may be due to the difference in rounding algorithms when converting from float to int pixel values.
I have checked the following:
canny
, depth
, mlsd
, normalbae
, openpose
, scribble
, seg
, softedge
, lineart
, lineart_anime
In terms of photorealism, the new softedge
seems to perform quite well. The lineart
and lineart_anime
models also demonstrate impressive coloring performance from hand-drawn line art.
The remaining tasks for verification are as follows.
inpaint
, shuffle
, ip2p
, tile
Immense thanks to @patrickvonplaten ❤️
New Controlnet v1.1 checkpoints have been released on the Hub! The release includes 14 new checkpoints with some cool applications such as Instruct-Pix2Pix ControlNet.
Model cards contain all the details you need to try it out 🌠 https://huggingface.co/models?sort=downloads&search=lllyasviel%2Fcontrol_v11
Therefore, I am closing this issue. But feel free to reopen.
@takuma104 please let me know if some checkpoints don't work as expected. I think the inpainting controlnet checkpoint still has some issues
@patrickvonplaten The status of my verification of remaining four models are as follows.
It is necessary to set the mask pixel to -1
if it is a tensor, which seems to be unique here.
https://github.com/lllyasviel/ControlNet-v1-1-nightly/blob/main/gradio_inpaint.py#L35
There is no need to use InpaintPipeline; it can be done with the usual StableDiffusionControlNetPipeline
. The code example is following. I feel that the generated quality is slightly lower than the original, so I will write a code that can completely compare with the original and investigate.
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
import torch
from diffusers.utils import load_image
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image_mask = np.array(image_mask.convert("L"))
assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
image[image_mask < 128] = -1.0 # set as masked pixel
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image
controlnet = ControlNetModel.from_pretrained('lllyasviel/control_v11p_sd15_inpaint',
torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained('runwayml/stable-diffusion-v1-5',
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None).to('cuda')
original_image = load_image('https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare_v11/control_images/pexels-sound-on-3760767_512x512.png')
mask_image = load_image('https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare_v11/control_images/mask_512x512.png')
pipe(prompt="best quality",
negative_prompt="lowres, bad anatomy, bad hands, cropped, worst quality",
generator=torch.manual_seed(2),
num_inference_steps=20,
guidance_scale=9.0,
image=make_inpaint_condition(original_image, mask_image)).images[0]
The result comparison:
Original Image | Mask Image | Generated Image |
---|---|---|
As I wrote in the first post, if we want to achieve compatibility to original, we might need to modify the ControlNetModel
. I plan to write a patch for this over the weekend, so once the PoC is done, I intend to open a PR.
Not yet. But the code in gradio_ip2p.py
does not seem to be doing anything particularly special, and based on the results from this page, it does appear to be fine.
Not yet. Since it is currently in an Unfinished
status, it would be wise to hold off on addressing it until it at least changes to an Experimental
status. In gradio_tile.py
, quite different processing is being performed, and a dedicated pipeline specific to this may be necessary.
That's a great summary! Would you like to open a PR to add the inpainting example to: https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint ?
Still need to find time to take a deeper look here though!
@takuma104 for the controlnet inpaint code you shared , the entire image gets disturbed. Given that there is a need to ensure that only the masked area gets inpainted, how does that hold good here? This is from the mikubil's repo : https://github.com/Mikubill/sd-webui-controlnet/issues/968 where they give guidelines on usage of inpaint model with inpainting functionality. Therefore, do you think it might be necessary to use inpaint pipeline ?
@patrickvonplaten I just opened a PR for control_v11p_sd15_inpaint. Please make appropriate modifications to the wording as needed.
@ghpkishore Thanks to let me know! I think it might be possible using the Inpaint Pipeline, so I'll give it a try.
Very cool! I think "tile" is the only checkpoint that is not tested yet, but it's also unfinished, so I guess we can wait until it's ready? https://github.com/lllyasviel/ControlNet-v1-1-nightly#controlnet-11-tile-unfinished
@patrickvonplaten You might already know, but tile
has happily been promoted to experimental status. It seems that adjustments in the code are necessary, so I'll think about the best approach.
https://github.com/lllyasviel/ControlNet-v1-1-nightly#controlnet-11-tile
From the processing flow of the reference gradio_tile.py
, it can be interpreted as an Img2Img with ControlNet. It seems that it will be fine to enlarge the input image up to the desired output size using LANCZOS or similar methods (general image resizing, not super-resolution), and use that as the condition_image for ControlNet and input for Img2Img. The code using the stable_diffusion_controlnet_img2img
Community Pipeline is as follows. I will conduct a detailed verification, but subjectively it seems to be working fine.
import torch
from PIL import Image
from diffusers import ControlNetModel, DiffusionPipeline, DDIMScheduler
from diffusers.utils import load_image
def resize_for_condition_image(input_image: Image, resolution: int):
input_image = input_image.convert("RGB")
W, H = input_image.size
k = float(resolution) / min(H, W)
H *= k
W *= k
H = int(round(H / 64.0)) * 64
W = int(round(W / 64.0)) * 64
img = input_image.resize((W, H), resample=Image.LANCZOS)
return img
controlnet = ControlNetModel.from_pretrained('takuma104/control_v11',
subfolder='control_v11f1e_sd15_tile',
torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
custom_pipeline="stable_diffusion_controlnet_img2img",
controlnet=controlnet,
torch_dtype=torch.float16).to('cuda')
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
source_image = load_image('https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/dog64.png')
condition_image = resize_for_condition_image(source_image, 1024)
pipe(prompt="best quality",
negative_prompt="blur, lowres, bad anatomy, bad hands, cropped, worst quality",
image=condition_image,
controlnet_conditioning_image=condition_image,
width=condition_image.size[0],
height=condition_image.size[1],
strength=1.0,
generator=torch.manual_seed(0),
num_inference_steps=32,
).images[0]
Input Image (64x64) | Output Image (1024x1024) |
---|---|
(new) tile
From the processing flow of the reference
gradio_tile.py
, it can be interpreted as an Img2Img with ControlNet. It seems that it will be fine to enlarge the input image up to the desired output size using LANCZOS or similar methods (general image resizing, not super-resolution), and use that as the condition_image for ControlNet and input for Img2Img. The code using thestable_diffusion_controlnet_img2img
Community Pipeline is as follows. I will conduct a detailed verification, but subjectively it seems to be working fine.import torch from PIL import Image from diffusers import ControlNetModel, DiffusionPipeline, DDIMScheduler from diffusers.utils import load_image def resize_for_condition_image(input_image: Image, resolution: int): input_image = input_image.convert("RGB") H, W = input_image.size k = float(resolution) / min(H, W) H *= k W *= k H = int(round(H / 64.0)) * 64 W = int(round(W / 64.0)) * 64 img = input_image.resize((W, H), resample=Image.LANCZOS if k > 1 else Image.AREA) return img controlnet = ControlNetModel.from_pretrained('takuma104/control_v11', subfolder='control_v11f1e_sd15_tile', torch_dtype=torch.float16) pipe = DiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_controlnet_img2img", controlnet=controlnet, torch_dtype=torch.float16).to('cuda') pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config) pipe.enable_xformers_memory_efficient_attention() source_image = load_image('https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/dog64.png') condition_image = resize_for_condition_image(source_image, 1024) pipe(prompt="best quality", negative_prompt="blur, lowres, bad anatomy, bad hands, cropped, worst quality", image=condition_image, controlnet_conditioning_image=condition_image, width=condition_image.size[0], height=condition_image.size[1], strength=1.0, generator=torch.manual_seed(0), num_inference_steps=32, ).images[0]
Input Image (64x64) Output Image (1024x1024)
Thanks for sharing the code!, a small bug in your image resize code
H, W = input_image.size
should be
W, H = input_image.size
:)
@xhinker Thanks! That's right. I just fixed above code.
Amazing work @takuma104 ! Would you like to add your example here: https://huggingface.co/lllyasviel/control_v11u_sd15_tile ?
It seems to work very nicely :-)
(new) tile
From the processing flow of the reference
gradio_tile.py
, it can be interpreted as an Img2Img with ControlNet. It seems that it will be fine to enlarge the input image up to the desired output size using LANCZOS or similar methods (general image resizing, not super-resolution), and use that as the condition_image for ControlNet and input for Img2Img. The code using thestable_diffusion_controlnet_img2img
Community Pipeline is as follows. I will conduct a detailed verification, but subjectively it seems to be working fine.import torch from PIL import Image from diffusers import ControlNetModel, DiffusionPipeline, DDIMScheduler from diffusers.utils import load_image def resize_for_condition_image(input_image: Image, resolution: int): input_image = input_image.convert("RGB") W, H = input_image.size k = float(resolution) / min(H, W) H *= k W *= k H = int(round(H / 64.0)) * 64 W = int(round(W / 64.0)) * 64 img = input_image.resize((W, H), resample=Image.LANCZOS) return img controlnet = ControlNetModel.from_pretrained('takuma104/control_v11', subfolder='control_v11f1e_sd15_tile', torch_dtype=torch.float16) pipe = DiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_controlnet_img2img", controlnet=controlnet, torch_dtype=torch.float16).to('cuda') pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config) pipe.enable_xformers_memory_efficient_attention() source_image = load_image('https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/dog64.png') condition_image = resize_for_condition_image(source_image, 1024) pipe(prompt="best quality", negative_prompt="blur, lowres, bad anatomy, bad hands, cropped, worst quality", image=condition_image, controlnet_conditioning_image=condition_image, width=condition_image.size[0], height=condition_image.size[1], strength=1.0, generator=torch.manual_seed(0), num_inference_steps=32, ).images[0]
Input Image (64x64) Output Image (1024x1024)
Can we have it without any dependency on community pipelines?
Yes agree we should move this to src/diffusers/pipelines
will allocate time for this today (hopefully :crossed_fingers:)
First PR here: https://github.com/huggingface/diffusers/pull/3386 should be done by tomorrow.
I second @ghpkishore 's point. The HF example changes the unmasked parts of the image. It also adds a green filter to the generated image. I noticed that the gradio_inpaint.py script is changed. The new logic seems to work much better than the previous example. It keeps the unmasked parts of the image unchanged. @takuma104 could you kindly give it another try that hopefully fixes the HF example?
Hey @classicboyir,
Actually we could get this working by making use of the callback function: https://github.com/huggingface/diffusers/blob/886575ee43c3e7060d74e2feb2018111e0998013/src/diffusers/pipelines/controlnet/pipeline_controlnet.py#L750
Just make sure the passed callback function has access to the mask and then we can make sure to not change the corresponding part of the image.
@patrickvonplaten can you elaborate on what you meant by callable function having access to mask. Can you provide an example on how to use it?
Hey @classicboyir,
Actually we could get this working by making use of the callback function:
Just make sure the passed callback function has access to the mask and then we can make sure to not change the corresponding part of the image.
@patrickvonplaten Also shouldn't this be the default assumption on how the masking should work? If the mask is given then it shouldn't be changed.
@patrickvonplaten to @ghpkishore 's point, shouldn't this be the default behavior? You'd expect inpainting to keep the unmasked parts untouched.
@patrickvonplaten to @ghpkishore 's point, shouldn't this be the default behavior? You'd expect inpainting to keep the unmasked parts untouched.
@patrickvonplaten any thoughts on this? plus do you have any example on how to achieve this with callbacks? I assume at every step, you need to restore the unmask part of the latent, is this a correct high-level description of the workflow?
https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile#example
Without using custom_pipeline?
Have Checked your controlnet img2img pipeline.
Can we use it instead? That's example I'm looking for.
Yeah, actually I'll try to adapt the inpaint
pipeline so that inpainting get be used natively with all CKPT models. Will keep you updated here. Also related to: https://github.com/huggingface/diffusers/issues/3497#issuecomment-1557767030
Works well :)
Will controlnet tile will work same way?
Yes I think it should, feel free to give it a try and let me know
what does make_inpaint_condition do?
Today, ControlNet v1.1 was released. As for the current situation, it seems to be positioned as a preview, and they are particularly working on improving the annotator (image preprocessing) code. It is said that most of the model weights are already production-ready.
Model weights:
https://huggingface.co/lllyasviel/ControlNet-v1-1
The weights have not been converted for Diffusers yet, but I think we can convert them using
scripts/convert_original_controlnet_to_diffusers.py
.Addendum:
I have released the converted weights for test purpose. To use them, specify the subfolder in the naming convention up to the "pth" like this:
At the moment, I have confirmed the normal operation for
canny
,depth
,mlsd
,normalbae
,openpose
,scribble
,seg
,softedge
,lineart
andlineart_anime
. Fornormalbae
, it seems that the control images created in v1.0 are no longer compatible, and the correct images are not generated out. It seems necessary to recreate them with the new annotator.Model architecture:
The NeuralNetwork structure is expected to remain the same as v1.0 until v1.5, so I haven't tested it yet, but it will most likely work almost as-is with the current
StableDiffusionControlNetPipeline
. However, some changes seem to be necessary for proper usage.global average pooling
.This was a quick report. I'm thinking of trying to proceed with testing and verification on my end as well.