Closed db3000 closed 2 years ago
My understanding is txt2img works with this model by passing in a fully masked dummy image for the extra channels, for example https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/6cbb04f7a5e675cf1f6dfc247aa9c9e8df7dc5ce/modules/processing.py#L559
Probably worth trying something like that out first as it would be much simpler to pass in dummy image/mask in txt2img mode if the model needs it than introducing a new split inpainting model concept for all models in the code.
@lstein Did you have any luck with this from @db3000 ?
"man with winged angel on shoulder" works. "man with eagle on shoulder" works "eagle" does not work "bald eagle" does not work "ferocious bald eagle" doesn't work "man with ferocious bald eagle on shoulder" works
I get the feeling that you need to describe the entire scene. Pretty non-intuitive given that regular inpainting will work with single words.
I must have missed @db3000 's question/comment/PR ??
It seems to benefit from a good description, indeed.
Note not necessarily the whole scene -e.g. try with woman and it's still you. But things like "sitting on shoulder", etc. do help.
Oh, that one. I've looked at AUTOMATIC's code and it looks straightforward, but because he has basically rewritten everything, there's more work to do than cutting and pasting. All the variable names and method calls are changed. Also, I'll need to detect when the inpainting module is in use and route all calls to txt2img
and img2img
through a new inpainting module.
Oh, by the way. Some people seem to be using it with clipseg. https://www.reddit.com/r/StableDiffusion/comments/y89apm/who_needs_prompt2prompt_anyway_sd_15_inpainting/ Which may help, since we wouldn't need to e.g. remove parts from images (e.g. hair). We could just say -tm "hair"
You can do that now with the old inpainting model:
"crow" -I Lincoln-and-Parrot.png -tm 'bird'
It works very reliably.
Yep, definitely 2 styles it seems, but the current model + clipseg might be more reliable.
an eagle sitting on shoulder
Preliminary implementation at #1243 . inpainting, txt2img and img2img are all working with the ddim, and k* samplers. The plms sampler needs a bit more love.
By the way, thanks to both of you (@db3000 @Any-Winter-4079) for your help with this. I'll credit you in the final commit (should have done that already, but neglected to).
The inpainting model is now supported.
I find inpainting frustrating as it takes a lot of tries to get something that matches the existing image nicely. It seems like there are inpainting-specific models that are much better at doing this, for example RunwayMLs model . Also its nice that the RunwayML model is trained with extra non-inpainting steps (akin to sd 1.5?) so it would be good to support txt2img using it as well
Is it possible to add support for this model here? I tried porting in the RunwayML changes directly but seems like more is needed.