huggingface / controlnet_aux

Apache License 2.0
400 stars 86 forks source link

how to use with diffusers? #7

Closed anotherjesse closed 1 year ago

anotherjesse commented 1 year ago

When I try to use the Midas depth image:

image = load_image("control.png")
depth_image, normal_image = midas(image)
output = self.pipe(
    prompt=["horse mars"]
    image=depth_image,
    # snip .....
)

I get an error:

image = self.prepare_image(                                                                                                          
File "/usr/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_control
net.py", line 515, in prepare_image                                                                                                  
image = image.transpose(0, 3, 1, 2)                                                                                                  
ValueError: axes don't match array                                                                                                   

Perhaps I'm doing something wrong?

To work around I've copied this snippet from the original controlnet:

    import numpy as np
    image = np.array(depth_image)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    depth_image = Image.fromarray(image)

I have to do the same thing for CannyDetector

patrickvonplaten commented 1 year ago

Hey @anotherjesse,

Thanks for the issue could you check the model cards here: https://huggingface.co/lllyasviel/sd-controlnet-seg#released-checkpoints for each controlnet there should be a working example. If something doesn't work - I'm happy to help :-)

anotherjesse commented 1 year ago

@patrickvonplaten for the heads up - I didn't notice that there were samples towards the bottom of each model.

My question is - if this repository is to process/prep things for controlnet, could or should Midas/Canny/others be returning images you can use in the pipelines directly?

In the code snippet for: https://huggingface.co/lllyasviel/sd-controlnet-depth

depth_estimator = pipeline('depth-estimation')

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png")

image = depth_estimator(image)['depth']
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained(
    "fusing/stable-diffusion-v1-5-controlnet-depth", torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

# Remove if you do not have xformers installed
# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
# for installation instructions
pipe.enable_xformers_memory_efficient_attention()

pipe.enable_model_cpu_offload()

image = pipe("Stormtrooper's lecture", image, num_inference_steps=20).images[0]

image.save('./images/stormtrooper_depth_out.png')

It seems like controlnet_aux's midas module should return a depth image ready for controlnet pipeline, instead of requiring the user to add these numpy lines between controlnet_aux and the instance of StableDiffusionControlNetPipeline.

image = np.array(depth_image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
depth_image = Image.fromarray(image)

Hmm, looking at more of the repository, perhaps that is what controlnet_aux.util.HWC3 is there for?

The way to use controlnet_aux + depth would be:

image = load_image("control.png")
depth_image, normal_image = midas(image)
image = HWC3(depth_image)
output = pipe(prompt, image, ...)

If so, I'll make a PR to add a note to the README :)

anotherjesse commented 1 year ago

I'm coming at this repository as a way to use StableDiffusionControlNetPipelines.

In that context it seems like the detectors output should work in the pipelines without further bit fiddling.

This might be the wrong perspective - and there are good reasons to return it the way it is...