invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.84k stars 2.45k forks source link

Inpainting-specific model support #1184

Closed db3000 closed 2 years ago

db3000 commented 2 years ago

I find inpainting frustrating as it takes a lot of tries to get something that matches the existing image nicely. It seems like there are inpainting-specific models that are much better at doing this, for example RunwayMLs model . Also its nice that the RunwayML model is trained with extra non-inpainting steps (akin to sd 1.5?) so it would be good to support txt2img using it as well

Is it possible to add support for this model here? I tried porting in the RunwayML changes directly but seems like more is needed.

Any-Winter-4079 commented 2 years ago

My understanding is txt2img works with this model by passing in a fully masked dummy image for the extra channels, for example https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/6cbb04f7a5e675cf1f6dfc247aa9c9e8df7dc5ce/modules/processing.py#L559

Probably worth trying something like that out first as it would be much simpler to pass in dummy image/mask in txt2img mode if the model needs it than introducing a new split inpainting model concept for all models in the code.

@lstein Did you have any luck with this from @db3000 ?

lstein commented 2 years ago

"man with winged angel on shoulder" works. "man with eagle on shoulder" works "eagle" does not work "bald eagle" does not work "ferocious bald eagle" doesn't work "man with ferocious bald eagle on shoulder" works

I get the feeling that you need to describe the entire scene. Pretty non-intuitive given that regular inpainting will work with single words.

lstein commented 2 years ago

I must have missed @db3000 's question/comment/PR ??

Any-Winter-4079 commented 2 years ago

Comment https://github.com/invoke-ai/InvokeAI/issues/1184#issuecomment-1289643901

Any-Winter-4079 commented 2 years ago

It seems to benefit from a good description, indeed.

Note not necessarily the whole scene -e.g. try with woman and it's still you. But things like "sitting on shoulder", etc. do help.

lstein commented 2 years ago

Oh, that one. I've looked at AUTOMATIC's code and it looks straightforward, but because he has basically rewritten everything, there's more work to do than cutting and pasting. All the variable names and method calls are changed. Also, I'll need to detect when the inpainting module is in use and route all calls to txt2img and img2img through a new inpainting module.

Any-Winter-4079 commented 2 years ago

Oh, by the way. Some people seem to be using it with clipseg. https://www.reddit.com/r/StableDiffusion/comments/y89apm/who_needs_prompt2prompt_anyway_sd_15_inpainting/ Which may help, since we wouldn't need to e.g. remove parts from images (e.g. hair). We could just say -tm "hair"

lstein commented 2 years ago

You can do that now with the old inpainting model:

"crow" -I Lincoln-and-Parrot.png -tm 'bird'

It works very reliably.

Any-Winter-4079 commented 2 years ago

Yep, definitely 2 styles it seems, but the current model + clipseg might be more reliable. an eagle sitting on shoulder

image
lstein commented 2 years ago

Preliminary implementation at #1243 . inpainting, txt2img and img2img are all working with the ddim, and k* samplers. The plms sampler needs a bit more love.

lstein commented 2 years ago

By the way, thanks to both of you (@db3000 @Any-Winter-4079) for your help with this. I'll credit you in the final commit (should have done that already, but neglected to).

psychedelicious commented 2 years ago

The inpainting model is now supported.