huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.41k stars 5.43k forks source link

MultiControlNet does not perform correct input/output chaining #5631

Open vladmandic opened 1 year ago

vladmandic commented 1 year ago

Describe the bug

I may be wrong, but looking at primary for loop in pipelines/controlnet/multicontrolnet.py, it seems its using same input for all controlnets in list.

that is a valid use case, but somewhat rare.

far more common use case for MultiControlNetModel is to be able to use output of first controlnet model as input to second controlnet model.

i'm filing this as an issue since second use case is actually expected behavior when using multiple controlnets in a list. but it can be considered as a feature request as well - i can refile if needed.

in either case, i think that ControlNetModel class needs additional property - something like input_use_current vs input_use_last. it would have no effect on single ControlNet, but it should impact how MultiControlNetModel behaves.

note that this applies to both StableDiffusionControlNetPipeline as well as all other variants such as StableDiffusionXLControlNetPipeline

Reproduction

test app is available at: https://github.com/vladmandic/control

Logs

N/A

System Info

diffusers==0.22.0.dev

Who can help?

@yiyixuxu @DN6 @patrickvonplaten @sayakpaul @patrickvonplaten

sayakpaul commented 1 year ago

Cc: @williamberman here too.

yiyixuxu commented 1 year ago

@patrickvonplaten I think this make sense, what do you think? happy to take this one

patrickvonplaten commented 1 year ago

Agree @vladmandic ! Let's change the loop here @yiyixuxu it'd be great if you could take this one

breengles commented 1 year ago

Hi @vladmandic ! Can you please clarify on what you mean by chaining controlnets? Probably, I am missing something but there are at least two points blocking that: 1) the input and output are of a different shape (some sort of image tensor as input vs a list of tensors for UNet blocks) and, what is more important, of a different "nature"; and 2) even if it is possible that way (again, I probably did not quite understand how yet) it would make overall control guidance to be dependent on the input order which is not the common use case for multicontrolnet I believe.

Also, it seems like the current main for-loop will iterate over each control hint (which must be already preprocessed with some annotator if necessary) and the respective model:

for i, (image, scale, controlnet) in enumerate(zip(controlnet_cond, conditioning_scale, self.nets)):
    ...
github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 11 months ago

plan to work on this but don't have time at the moment, so open this up to the community to see if anyone wants to pick this up before I do :)