invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.65k stars 2.43k forks source link

[enhancement]: Concat ControlNet(s) for easier multiple control nets into a denoise latent #4526

Open c-dante opened 1 year ago

c-dante commented 1 year ago

Is there an existing issue for this?

Contact Details

No response

What should this feature add?

In workflows, the DenoiseLatents steps takes a ControlNet or a List[ControlNet], but the only way to get one (that I found) is via a collect -- this doesn't work when you're inside of an iteration, as it collects EVERY control net before passing them along. You can have an iterated control net off of, say, a video, and then have a fixed set of control nets to keep the context like a background or color palette.

Here's a more or less generic Concat node that does concat(T|T[], T|T[]): T[] (similar to JS's concat), attached in Additional Content. This node can be generalized to not just ControlNet type, but I just wanted to get going.

I'd love to have a way to coordinate indexes across graph invocations so I can grab the previous image and use that for context to make sequences, but hey, what can ya' do.

Alternatives

No response

Additional Content

from typing import Union

from .baseinvocation import BaseInvocation, BaseInvocationOutput, Input, InputField, InvocationContext, invocation, UIType, OutputField, invocation_output
from .controlnet_image_processors import ControlField

class FieldDescriptions:
    control = "An input ControlNet or List[ControlNet]"

@invocation_output("JoinControlNetsOutput")
class JoinControlNetsOutput(BaseInvocationOutput):
    collection: list[ControlField] = OutputField(
        description="The list of ControlNets for inference", title="ControlNets", ui_type=UIType.Collection
    )

@invocation("JoinControlNetsInvocation", title="Join ControlNets", tags=["collection", "controlnet", "i2i", "img2img"], category="collections")
class JoinControlNetsInvocation(BaseInvocation):
    """Concatenate 2 input ControlNet or List[ControlNet] inputs"""

    first: Union[ControlField, list[ControlField]] = InputField(
        default=None, description=FieldDescriptions.control, input=Input.Connection, ui_order=0
    )
    second: Union[ControlField, list[ControlField]] = InputField(
        default=None, description=FieldDescriptions.control, input=Input.Connection, ui_order=1
    )

    def invoke(self, context: InvocationContext) -> JoinControlNetsOutput:
        control_list = []
        self.concat_field(control_list, self.first)
        self.concat_field(control_list, self.second)
        return JoinControlNetsOutput(collection=control_list)

    def concat_field(self, control_list: list[ControlField], input: Union[ControlField, list[ControlField]]):
        if isinstance(input, ControlField):
            control_list.append(input)
        elif isinstance(input, list):
            control_list.extend(input)
        return control_list
Millu commented 1 year ago

Thanks for putting this together - would you feel comfortable opening a PR for this? We have a similar string concat node that could be generalized here!

c-dante commented 1 year ago

I was struggling to get the input/output connections happy with generic types? Pydantic and the UI just refused to show output connections when I naively did Any -- is that something you could point me to an example of? More than happy to make this a generic concat node!

Also, I'm noticing a huge lack in generic arithmetic -- I was kicking around with implementing an eval node that can take in other nodes, expose them as a map of locals, and just let you go hog wild. It's worked nice for me to just spin stuff up, but let me know if that's something there's appetite for.

Millu commented 1 year ago

Tagging in @psychedelicious to help with the output connections

Re generic arithmetic, there was a PR that was added recently to add math functions: #4484 A generic eval node was punted on in favor of getting math functions in quickly. Are there other functions you are looking to have?

Also are you in the Discord? Come say hi in the #nodes-chat channel! There's other folks that would be able to provide help and thoughts!

psychedelicious commented 1 year ago

Hi @c-dante , you should be able to connect either a single controlnet into a control input, or connect multiples to a collect node, then connect the collection to the control input. It's working for me on main. Does this not work for you? What if you disable validation (behind the gear icon)?

The UI won't handle Any right now, but you can check out #4528 in which I've added support. Due to how OpenAPI schemas are generated, you'd use ui_type=UIType.Any to designate a field as such.

I'm reluctant to add too many things that use Any because it's a major footgun opportunity.

Inputs typed as Any are fairly safe, because the node author has agency over how the incoming data is processed. But an Any output is scary! You can't provide guardrails with an Any output.

The use-case in the linked PR is to allow arbitrary metadata construction - so you can build up your own metadata format to store to the db and image metadata. This requires a node that can accept Any as an input, and stringify it. Because all node inputs and outputs are necessarily serializable, this is hopefully rather hard to abuse.

PS: The UI must have field types defined and handled explicitly. I've considered a few ways to allow arbitrary/dynamic field types, but it's not near the top of the list right now.

PPS: I've improved the UI's handling of polymorphic fields in #4545, which may be of interest. This and the other PR are waiting on #4502 to merge.

c-dante commented 1 year ago

Hey hey! Awesome stuff and thanks for the detailed reply!

So, this is specifically a problem when you have multiple control nets downstream of an Iterate -- Collect's behavior when it's downstream from Iterate is to consume the iteration entirely, then produce the full result. An example JSON of the kind of pipeline a "concat" node like this addresses: https://pastebin.com/0PqT4svT

Totally on board with "Any is dangerous" -- I'll check out the PRs for polymorphic types -- my ideal is a generic type where the two have to agree and that sets the output type (So, a generic <T> concat(T | T[] | None, T | T[] | None) -> T[] that "just works" is the ideal)

Broadly, this is more a control over iterate...collect pairs and how to properly nest loops and operations.

I guess another avenue for this is a toggle on collect to decide whether or not it consumes iterate...