Closed morganavr closed 1 year ago
I agree that inpainting is a really fun feature. Because it works best when integrated into a GUI drawing application, I haven't written in support for it in dream.py. Will keep this issue open, and if there's lots of enthusiasm, I can add the feature in.
I agree with this feature, and i'm not strictly opposed to the GUI feature, but I think this feature should just start with image mask input (with color ID) so we can batch process more easily without gui requirements.
Also, It would be nice to invoke dream.py from another easily readable and modifiable batch.py that add many prompts, or iterate many images for the image2image.
Looking into this since inpaint.py and sample masks are included in the original repo.
It would be also neat if it's a png to detect transparent pixels and use those as a mask for one-stop-shopping thus, use photopea, import image, erase some things, save it, then dream>funny hat -m /path/to/my/image.png
I'm doing a little code refactoring, and after that it will be quite easy to script multiple variations, prompts, etc. Appreciate any contributions to the inpainting, since I"m not all that familiar with what goes into an image mask. I thought it was just a black and white bitmap?
If building a GUI w/ Gradio, seems like they have a pretty easy interface to inpainting now: https://twitter.com/Gradio/status/1544780590846644224
I"m not all that familiar with what goes into an image mask. I thought it was just a black and white bitmap? I think Mask is a float value from 0 to 1.
Looking more closely at two images of the GUI I can see that image mask consists of these parameters:
Right panel
Left panel
As you can see, there are lots of "possibly". I have sent a Chat message to the author on Reddit to understand how these parameters really work together and their connection to Mask value. Let's hope he responds.
Not the owner, but I would recommend keeping the headless approach and not having a tie in. The only required item is as what @lstein mentions, a black and white overlay that indicates the mask.
If we have a PNG, we could compute the mask from the transparent pixels in the image, thus allowing any image editor to be used, be it something like photopea, or photoshop
Alternatively, as with the images in the /data/inpainting_examples, there's a parallel, bitmap mask that indicates the parts of the image to occlude
The process would be to load the image in your favorite image editor of choice. Use the eraser tool to cut away areas of the image you don't like, save the image and then use a dream command like dream>my prompt --mask /path/to/the/image.png. It does the rest and would do the inpainting
Gradio support would be an option, but that could work something like dream>"my prompt" --maskhelper
this would be a great feature. it would be possible to edit parts of the image that were poorly drawn by the neural network or made with an error, as well as extend images that were unsuccessfully cropped
Vouch for wanting this as well, inpainting would make things so much easier.
would also love this feature
@Oceanswave
a black and white overlay that indicates the mask If we have a PNG, we could compute the mask from the transparent pixels in the image... ... The process would be ... Use the eraser tool to cut away areas of the image you don't like...
A transparency in PNG files changes from 0 to 100% in graphical editors. Image can be partially transparent like so:
To make inpainting tool more powerful it could be useful to not simply use "eraser" tool that completely erases parts of the image (applies 100% transparency to areas) or applies black and white overlays (representing only two values of transparency: 0% and 100%) but to process all pixels that are even partially transparent (using all values of transparency from 0% to 100%). That could result in interesting effects during inpainting.
Ok. So we find all pixels that have an alpha<1 (transparent to any extent), convert these into a bitmask (value 0,0,0) and use this as the inpainting mask. Do I have that right?
Looks like the underlying decoder expects a black and white png mask judging from the examples in SD repo and stuff on reddit.
And it what gradio interfaces produces too, so it seems like a generally good idea to keep things compatible. I'd second @Oceanswave idea. Take a png mask. If it has transparent pixels convert it first to black/white png then feed to the decoder.
Would be good overall to add a UI to your botstyle code as well so it can run in browser like others with gradio
linking brush strength with init_image strength could be a cool feature
soo how do we apply a mask? is there a usable implementation?
If it's a 2value straight alpha, maybe a dithering filter could make gradient still doable. If it's premultiplied 256value, then black and white bitmap from alpha channel of .png should make gradient possible. If the model accept weights per pixel, then RGB image (or cryptomate exr) could be doable for ID masks with one color per weight prompt. But I don't know exactly what stable diffusion allow or not and what could be integrated.
It would have to be a dither gradient. I don't see any reason that the pixels in the mask need to be contiguous. I also don't know how to do a dither gradient to test this out :-(
as i can see there are inpainting already: https://github.com/pesser/stable-diffusion#inpainting
as i can see there are inpainting already: https://github.com/pesser/stable-diffusion#inpainting
this is for latent diffusion, not stable diffusion :)
I'll add my vote, inpainting was my favorite feature of dalle-2 (before I ran out of credits)..
I could take a photo of someone, mask out their clothes and generate infinite outfit styles, or I could take one of my real world landscape photos, mask out an area and introduce fictional elements.. Also worked on Unreal Engine screenshots where I had an area that was populated with foliage and masked out a barren empty area and got cool results.
I'll also raise my hand for inpainting; I think the idea of 'weighted img2img' is really interesting especially if it could be combined with RePaint's (https://github.com/andreas128/RePaint) method of continuously inserting the unmasked parts of the original image with each step and adding noise to ensure consistency. If it was possible to add non-binary masks on top of that, I think this feature would be extremely powerful.
It would have to be a dither gradient. I don't see any reason that the pixels in the mask need to be contiguous. I also don't know how to do a dither gradient to test this out :-(
Hi, there is a library for image processing, PIL (or pillow) that allow to extract alpha channel, do processing, etc... Example of use : https://stackoverflow.com/questions/18777873/convert-rgb-to-black-or-white/18778280#18778280 Input image could be .png only and script interpret like that : If RGBA, use Alpha with dither filter, If RGB, use only luminance and clip to half (128), if B&W premultiplied, use dither filter, if B&W straight, use as it is.
Would also love inpainting support, specifically via the API. I feel like inpainting has so many uses that we haven't even thought about yet
Gradio has implemented and open sourced their initial version of inpainting:
https://twitter.com/Gradio/status/1562827932871303170
I would love to have it added to this repo too
Giving it a shot... have the mask and the changes in.. but unclear of what x0 needs to be and OP mentioned something about downsampling the mask..
What goes here?? @morganavr
samples = sampler.decode(z_enc, c, t_enc, unconditional_guidance_scale=cfg_scale,
unconditional_conditioning=uc, x0=init_latent, mask=mask[::4, ::4]) # Uh.. what goes here?
unclear of what x0 needs to be
In one place of source code (notebook_helpers.py) I can see that x0=z0
.
I have asked @Doggettx user on Reddit who proposed x0 usage. Let's hope for the answer.
sample, intermediates = convsample_ddim(model, c, steps=custom_steps, shape=z.shape,
eta=eta,
quantize_x0=quantize_x0, img_callback=img_cb, mask=None, x0=z0,
temperature=temperature, noise_dropout=noise_dropout,
score_corrector=corrector, corrector_kwargs=corrector_kwargs,
x_T=x_T, log_every_t=log_every_t)
For x0 you can just use z_enc that's passed originally to the function. That already has noise added to it though, so it's probably better to create it like z_enc but with the parameter noise=0 added to the function call.
To create the mask you can look at the original inpaint.py, there's already code there that does the same.
This is what I use now, my masks are inverted though:
def load_mask(file, maxw, maxh, downsample=None, mask_strength=1.):
image = Image.open(file).convert("L")
w, h = image.size
print(f"loaded input mask of size ({w}, {h})")
if w > maxw:
fac = maxw / w
h = int(fac * h)
w = maxw
if h > maxh:
fac = maxh / h
w = int(fac * w)
h = maxh
w, h = map(lambda x: x - x % 64, (w, h))
if downsample is not None:
w //= downsample
h //= downsample
image = image.resize((w, h), resample=Image.Resampling.LANCZOS)
print(f"resized to ({w}, {h})")
image = 1 - (1 - (np.array(image).astype(np.float32) / 255.0)) * mask_strength
image = image[None, None]
image = torch.from_numpy(image)
return image
Yuk sorry for my thick skull - still working on this
I have the mask - in my case getting the alpha channel from a png - that's the easy part, I pass the mask and x0, but I just get the x0 image back
def _load_img(self, path):
image = Image.open(path)
mask = None
masked_image = None
# Determining if the image has an alpha channel tells us if we want to extract a mask
if has_transparency(image):
# Obtain the mask from the transparency channel before we get rid of it
mask = Image.new(mode="L", size=image.size, color=255)
mask.putdata(image.getdata(band=3))
# Invert so white is transparent
mask = ImageOps.invert(mask)
image = image.convert('RGB')
w, h = image.size
print(f'loaded input image of size ({w}, {h}) from {path}')
w, h = map(
lambda x: x - x % 32, (w, h)
) # resize to integer multiple of 32
image = image.resize((w, h), resample=Image.Resampling.LANCZOS)
masked_image = image.copy()
image = np.array(image).astype(np.float32) / 255.0
image = image[None].transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
if mask is not None:
# we need to resize the mask too
mask = mask.resize((w, h), resample=Image.Resampling.LANCZOS)
mask.save("outputs/test_mask.png")
mask = np.array(mask).astype(np.float32) / 255.0
mask = mask.astype(np.float32)/255.0
mask[mask < 0.5] = 0
mask[mask >= 0.5] = 1
mask = torch.from_numpy(mask)
masked_image = (1-mask)*image
return {
"img": 2.0 * image - 1.0,
"mask": mask,
"masked_img": masked_image
}
My attempts to make guy with a pineapple hat into a lovely badger always go the wrong way - which means my maths is inverted.. but ..
.
Yuk sorry for my thick skull - still working on this
I have the mask - in my case getting the alpha channel from a png - that's the easy part, I pass the mask and x0, but I just get the x0 image back
You seem to be only using the top left small corner of the mask? You need to downsize it to 1/8th the size, I think you're using the top left 1/8th of the array instead.
@Oceanswave
My attempts to make guy with a pineapple hat into a lovely badge
I think that this exact image is bad for testing purposes because it's background has solid color and it's difficult to quickly analyze what parts of the image were affected by inpainting.
@Oceanswave; Maybe "my masks are inverted though" (from @Doggettx)? Seems like that's exactly what's happening in your example - affected vs. unaffected areas are swapped.
It seems that this fork has inpainting, I have not tested it yet.
https://github.com/hlky/stable-diffusion
Mask painting (NEW) 🖌️: Powerful tool for re-generating only specific parts of an image you want to change
It seems that this fork has inpainting, I have not tested it yet.
https://github.com/hlky/stable-diffusion
Mask painting (NEW) 🖌️: Powerful tool for re-generating only specific parts of an image you want to change
It's masking. There is no specialized model involved in the "inpainting" process there.
See this discussion
https://www.reddit.com/r/StableDiffusion/comments/ww7qdl/applying_masks_to_the_img2img_generation_to/?utm_medium=android_app&utm_source=share
Author and other user(Doggettx) of this reddit thread came up with 2 lines code modification that adds a feature very similar to inpainting.
Author's code:![image](https://user-images.githubusercontent.com/51041813/186469656-98cace88-aa77-4625-b90d-2747db70e4b9.png)
Doggettx code:![image](https://user-images.githubusercontent.com/51041813/186469506-91fd44fb-9a39-4cda-9eb5-a720e45e074e.png)
I think that GUI is also required for this feature since user will need to draw masks. This GUI can be a starting point of new GUI for this whole repo (if author agrees) and console terminal could be implemented inside separate panel in GUI.
On these two images dogs are different but woman's face is the same. On top image red mask is less transparent. There are differences in parameters values in GUI: Left panel (Overlay alpha), Right panel (Prompt)
Personally I think that results are fantastic - with this feature you can select parts of the image that you want to change and only them will be changed. It seems like inpainting feature from DALLE 2 to me.
There is another repo https://github.com/andreas128/RePaint with inpainting technique but it looks more complicated than 2 lines approach (plus time to create GUI) proposed here.
Preview images of inpainting GUI: