invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.39k stars 2.32k forks source link

Inpainting #68

Closed morganavr closed 1 year ago

morganavr commented 1 year ago

https://www.reddit.com/r/StableDiffusion/comments/ww7qdl/applying_masks_to_the_img2img_generation_to/?utm_medium=android_app&utm_source=share

Author and other user(Doggettx) of this reddit thread came up with 2 lines code modification that adds a feature very similar to inpainting.

Author's code: image

Doggettx code: image

I think that GUI is also required for this feature since user will need to draw masks. This GUI can be a starting point of new GUI for this whole repo (if author agrees) and console terminal could be implemented inside separate panel in GUI.

On these two images dogs are different but woman's face is the same. On top image red mask is less transparent. There are differences in parameters values in GUI: Left panel (Overlay alpha), Right panel (Prompt)

Personally I think that results are fantastic - with this feature you can select parts of the image that you want to change and only them will be changed. It seems like inpainting feature from DALLE 2 to me.

There is another repo https://github.com/andreas128/RePaint with inpainting technique but it looks more complicated than 2 lines approach (plus time to create GUI) proposed here.

Preview images of inpainting GUI:

image

image

lstein commented 1 year ago

I agree that inpainting is a really fun feature. Because it works best when integrated into a GUI drawing application, I haven't written in support for it in dream.py. Will keep this issue open, and if there's lots of enthusiasm, I can add the feature in.

softyoda commented 1 year ago

I agree with this feature, and i'm not strictly opposed to the GUI feature, but I think this feature should just start with image mask input (with color ID) so we can batch process more easily without gui requirements.

Also, It would be nice to invoke dream.py from another easily readable and modifiable batch.py that add many prompts, or iterate many images for the image2image.

Oceanswave commented 1 year ago

Looking into this since inpaint.py and sample masks are included in the original repo.

It would be also neat if it's a png to detect transparent pixels and use those as a mask for one-stop-shopping thus, use photopea, import image, erase some things, save it, then dream>funny hat -m /path/to/my/image.png

lstein commented 1 year ago

I'm doing a little code refactoring, and after that it will be quite easy to script multiple variations, prompts, etc. Appreciate any contributions to the inpainting, since I"m not all that familiar with what goes into an image mask. I thought it was just a black and white bitmap?

samburger commented 1 year ago

If building a GUI w/ Gradio, seems like they have a pretty easy interface to inpainting now: https://twitter.com/Gradio/status/1544780590846644224

morganavr commented 1 year ago

I"m not all that familiar with what goes into an image mask. I thought it was just a black and white bitmap? I think Mask is a float value from 0 to 1.

Looking more closely at two images of the GUI I can see that image mask consists of these parameters:

Right panel

Left panel

As you can see, there are lots of "possibly". I have sent a Chat message to the author on Reddit to understand how these parameters really work together and their connection to Mask value. Let's hope he responds.

Oceanswave commented 1 year ago

Not the owner, but I would recommend keeping the headless approach and not having a tie in. The only required item is as what @lstein mentions, a black and white overlay that indicates the mask.

If we have a PNG, we could compute the mask from the transparent pixels in the image, thus allowing any image editor to be used, be it something like photopea, or photoshop

Alternatively, as with the images in the /data/inpainting_examples, there's a parallel, bitmap mask that indicates the parts of the image to occlude

The process would be to load the image in your favorite image editor of choice. Use the eraser tool to cut away areas of the image you don't like, save the image and then use a dream command like dream>my prompt --mask /path/to/the/image.png. It does the rest and would do the inpainting

Gradio support would be an option, but that could work something like dream>"my prompt" --maskhelper

thezveroboy commented 1 year ago

this would be a great feature. it would be possible to edit parts of the image that were poorly drawn by the neural network or made with an error, as well as extend images that were unsuccessfully cropped

vekst commented 1 year ago

Vouch for wanting this as well, inpainting would make things so much easier.

TingTingin commented 1 year ago

would also love this feature

morganavr commented 1 year ago

@Oceanswave

a black and white overlay that indicates the mask If we have a PNG, we could compute the mask from the transparent pixels in the image... ... The process would be ... Use the eraser tool to cut away areas of the image you don't like...

A transparency in PNG files changes from 0 to 100% in graphical editors. Image can be partially transparent like so: image

To make inpainting tool more powerful it could be useful to not simply use "eraser" tool that completely erases parts of the image (applies 100% transparency to areas) or applies black and white overlays (representing only two values of transparency: 0% and 100%) but to process all pixels that are even partially transparent (using all values of transparency from 0% to 100%). That could result in interesting effects during inpainting.

lstein commented 1 year ago

Ok. So we find all pixels that have an alpha<1 (transparent to any extent), convert these into a bitmask (value 0,0,0) and use this as the inpainting mask. Do I have that right?

codedealer commented 1 year ago

Looks like the underlying decoder expects a black and white png mask judging from the examples in SD repo and stuff on reddit.

And it what gradio interfaces produces too, so it seems like a generally good idea to keep things compatible. I'd second @Oceanswave idea. Take a png mask. If it has transparent pixels convert it first to black/white png then feed to the decoder.

1blackbar commented 1 year ago

Would be good overall to add a UI to your botstyle code as well so it can run in browser like others with gradio

nicolai256 commented 1 year ago

linking brush strength with init_image strength could be a cool feature

neonsecret commented 1 year ago

soo how do we apply a mask? is there a usable implementation?

softyoda commented 1 year ago

If it's a 2value straight alpha, maybe a dithering filter could make gradient still doable. If it's premultiplied 256value, then black and white bitmap from alpha channel of .png should make gradient possible. If the model accept weights per pixel, then RGB image (or cryptomate exr) could be doable for ID masks with one color per weight prompt. But I don't know exactly what stable diffusion allow or not and what could be integrated.

lstein commented 1 year ago

It would have to be a dither gradient. I don't see any reason that the pixels in the mask need to be contiguous. I also don't know how to do a dither gradient to test this out :-(

thezveroboy commented 1 year ago

as i can see there are inpainting already: https://github.com/pesser/stable-diffusion#inpainting

nicolai256 commented 1 year ago

as i can see there are inpainting already: https://github.com/pesser/stable-diffusion#inpainting

this is for latent diffusion, not stable diffusion :)

XodrocSO commented 1 year ago

I'll add my vote, inpainting was my favorite feature of dalle-2 (before I ran out of credits)..

I could take a photo of someone, mask out their clothes and generate infinite outfit styles, or I could take one of my real world landscape photos, mask out an area and introduce fictional elements.. Also worked on Unreal Engine screenshots where I had an area that was populated with foliage and masked out a barren empty area and got cool results.

oppie85 commented 1 year ago

I'll also raise my hand for inpainting; I think the idea of 'weighted img2img' is really interesting especially if it could be combined with RePaint's (https://github.com/andreas128/RePaint) method of continuously inserting the unmasked parts of the original image with each step and adding noise to ensure consistency. If it was possible to add non-binary masks on top of that, I think this feature would be extremely powerful.

softyoda commented 1 year ago

It would have to be a dither gradient. I don't see any reason that the pixels in the mask need to be contiguous. I also don't know how to do a dither gradient to test this out :-(

Hi, there is a library for image processing, PIL (or pillow) that allow to extract alpha channel, do processing, etc... Example of use : https://stackoverflow.com/questions/18777873/convert-rgb-to-black-or-white/18778280#18778280 Input image could be .png only and script interpret like that : If RGBA, use Alpha with dither filter, If RGB, use only luminance and clip to half (128), if B&W premultiplied, use dither filter, if B&W straight, use as it is.

altbitlimited commented 1 year ago

Would also love inpainting support, specifically via the API. I feel like inpainting has so many uses that we haven't even thought about yet

morganavr commented 1 year ago

Gradio has implemented and open sourced their initial version of inpainting: https://twitter.com/Gradio/status/1562827932871303170 image

surrealism7x commented 1 year ago

I would love to have it added to this repo too

Oceanswave commented 1 year ago

Giving it a shot... have the mask and the changes in.. but unclear of what x0 needs to be and OP mentioned something about downsampling the mask..

What goes here?? @morganavr

https://github.com/BaristaLabs/stable-diffusion-dream/blob/3e207eac19b3e24691cd3abc89f761427f1687db/ldm/simplet2i.py#L417

samples = sampler.decode(z_enc, c, t_enc, unconditional_guidance_scale=cfg_scale,
                                                    unconditional_conditioning=uc, x0=init_latent, mask=mask[::4, ::4]) # Uh.. what goes here?
morganavr commented 1 year ago

unclear of what x0 needs to be

In one place of source code (notebook_helpers.py) I can see that x0=z0. I have asked @Doggettx user on Reddit who proposed x0 usage. Let's hope for the answer.

sample, intermediates = convsample_ddim(model, c, steps=custom_steps, shape=z.shape,
                                                eta=eta,
                                                quantize_x0=quantize_x0, img_callback=img_cb, mask=None, x0=z0,
                                                temperature=temperature, noise_dropout=noise_dropout,
                                                score_corrector=corrector, corrector_kwargs=corrector_kwargs,
                                                x_T=x_T, log_every_t=log_every_t)
Doggettx commented 1 year ago

For x0 you can just use z_enc that's passed originally to the function. That already has noise added to it though, so it's probably better to create it like z_enc but with the parameter noise=0 added to the function call.

To create the mask you can look at the original inpaint.py, there's already code there that does the same.

This is what I use now, my masks are inverted though:

def load_mask(file, maxw, maxh, downsample=None, mask_strength=1.):
    image = Image.open(file).convert("L")
    w, h = image.size
    print(f"loaded input mask of size ({w}, {h})")

    if w > maxw:
        fac = maxw / w
        h = int(fac * h)
        w = maxw

    if h > maxh:
        fac = maxh / h
        w = int(fac * w)
        h = maxh

    w, h = map(lambda x: x - x % 64, (w, h))  

    if downsample is not None:
        w //= downsample
        h //= downsample

    image = image.resize((w, h), resample=Image.Resampling.LANCZOS)
    print(f"resized to ({w}, {h})")
    image = 1 - (1 - (np.array(image).astype(np.float32) / 255.0)) * mask_strength
    image = image[None, None]
    image = torch.from_numpy(image)

    return image
Oceanswave commented 1 year ago

Yuk sorry for my thick skull - still working on this

I have the mask - in my case getting the alpha channel from a png - that's the easy part, I pass the mask and x0, but I just get the x0 image back

https://github.com/BaristaLabs/stable-diffusion-dream/blob/f25a085a555b8db32a154256d7ee6538b71d0172/ldm/simplet2i.py#L448

 def _load_img(self, path):

        image = Image.open(path)
        mask = None
        masked_image = None

        # Determining if the image has an alpha channel tells us if we want to extract a mask
        if has_transparency(image):
            # Obtain the mask from the transparency channel before we get rid of it
            mask = Image.new(mode="L", size=image.size, color=255)

            mask.putdata(image.getdata(band=3))
            # Invert so white is transparent
            mask = ImageOps.invert(mask)

        image = image.convert('RGB')

        w, h = image.size
        print(f'loaded input image of size ({w}, {h}) from {path}')
        w, h = map(
            lambda x: x - x % 32, (w, h)
        )  # resize to integer multiple of 32
        image = image.resize((w, h), resample=Image.Resampling.LANCZOS)
        masked_image = image.copy()
        image = np.array(image).astype(np.float32) / 255.0
        image = image[None].transpose(0, 3, 1, 2)
        image = torch.from_numpy(image)

        if mask is not None:
            # we need to resize the mask too
            mask = mask.resize((w, h), resample=Image.Resampling.LANCZOS)
            mask.save("outputs/test_mask.png")
            mask = np.array(mask).astype(np.float32) / 255.0
            mask = mask.astype(np.float32)/255.0
            mask[mask < 0.5] = 0
            mask[mask >= 0.5] = 1
            mask = torch.from_numpy(mask)

            masked_image = (1-mask)*image

        return {
            "img": 2.0 * image - 1.0,
            "mask": mask,
            "masked_img": masked_image
        }
Oceanswave commented 1 year ago

My attempts to make guy with a pineapple hat into a lovely badger always go the wrong way - which means my maths is inverted.. but .. foobar

000128 89751645 .

Doggettx commented 1 year ago

Yuk sorry for my thick skull - still working on this

I have the mask - in my case getting the alpha channel from a png - that's the easy part, I pass the mask and x0, but I just get the x0 image back

You seem to be only using the top left small corner of the mask? You need to downsize it to 1/8th the size, I think you're using the top left 1/8th of the array instead.

morganavr commented 1 year ago

@Oceanswave

My attempts to make guy with a pineapple hat into a lovely badge

I think that this exact image is bad for testing purposes because it's background has solid color and it's difficult to quickly analyze what parts of the image were affected by inpainting.

tildebyte commented 1 year ago

@Oceanswave; Maybe "my masks are inverted though" (from @Doggettx)? Seems like that's exactly what's happening in your example - affected vs. unaffected areas are swapped.

morganavr commented 1 year ago

It seems that this fork has inpainting, I have not tested it yet.

https://github.com/hlky/stable-diffusion

Mask painting (NEW) 🖌️: Powerful tool for re-generating only specific parts of an image you want to change

codedealer commented 1 year ago

It seems that this fork has inpainting, I have not tested it yet.

https://github.com/hlky/stable-diffusion

Mask painting (NEW) 🖌️: Powerful tool for re-generating only specific parts of an image you want to change

It's masking. There is no specialized model involved in the "inpainting" process there.

See this discussion