google / prompt-to-prompt

Apache License 2.0
3.07k stars 285 forks source link

code for user-defined mask #74

Open fabrizioguillaro opened 11 months ago

fabrizioguillaro commented 11 months ago

Hello! I am trying to use your code for "Null-text Inversion for Editing Real Images using Guided Diffusion Models". In particular, since I have an inpainting mask, I am trying to generate an image using a user-defined mask (like shown in fig. 8 or fig. 14 of "Prompt-To-Prompt Image Editing With Cross-Attention Control"). The code for using user-defined mask is missing, so I was trying to implement a way to do that. Did you just apply the given mask instead of the one computed from the prompt in LocalBlend? Could the following code represent what you did (resizing the mask to 64x64, repeating over the 2 channels, applying the mask to the latent space)?

class LocalBlend:
    ...
    def __init__(...)
        ...
        mask = np.array(Image.fromarray(mask).resize((64, 64), Image.NEAREST))
        mask = mask[None,None,:,:]
        mask = mask.repeat(2, axis=0)
        self.mask = torch.from_numpy(mask).cuda()

    def __call__(...)
        ...
        mask = self.mask
        mask = mask.float()
        x_t = x_t[:1] + mask * (x_t - x_t[:1])
fabrizioguillaro commented 11 months ago

The code I wrote works (example in the image), I am just wondering if it follows the way you intended to do it.

As you can see, using the given mask, the code above allows me to edit just the pie on the left, instead of all the pies: image

Yutong-Dai commented 10 months ago

The code I wrote works (example in the image), I am just wondering if it follows the way you intended to do it.

As you can see, using the given mask, the code above allows me to edit just the pie on the left, instead of all the pies: image

Thanks for bringing this up. I also have a similar question about replacing the estimated mask with user-provided masks. Could you share the code to reproduce the results shown in the above example? I noticed that the rolling pin on the right was distorted, even with the presence of the mask.

AhmedBourouis commented 6 months ago

@fabrizioguillaro what if the mask didn't match the position of the pie? like the mask is on the right.. would it still give reasonable results?