lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Image Editing #89

Closed egeozsoy closed 2 years ago

egeozsoy commented 2 years ago

Do you have any suggestions for generating images by not only relying on text, but on a masked image, as open describes in their blog https://openai.com/dall-e-2/?

xiankgx commented 2 years ago

You need to train it with inpainting task. In particular, the Decoder Unet needs to be able to take in a mask input to be concatenated with masked image on the channel dimension to predict the original image. Right now I think this feature is not implemented in this repo.

egeozsoy commented 2 years ago

I think this would be a nice addition at some point, if I do anything on this regard will let you know :)

xiankgx commented 2 years ago

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

egeozsoy commented 2 years ago

Agree, it would be a complementary task so doing both tasks at the same time should likely not hurt the overall performance.

xiankgx commented 2 years ago

Correction to the above, the masked image and mask would also need to be concatenated to x (aka the noised image).

egeozsoy commented 2 years ago

So if one model is trained for both tasks at the same time, we would need to do noised_image + empty masked image + empty mask during normal training, noised_image + masked_original_image + mask during inpainting training.

Do we have to add noise to the entire image, or is it enough just to add noise to the masked part? Not exactly the most scientific resource but check the video on https://openai.com/dall-e-2/ timestamp 2:37 "monkey paying taxes". It seems like they input an image where only the masked part is noised.

xiankgx commented 2 years ago

During training, only x is noised or denoised and masked image and mask used directly from the various pieces of openai GLIDE and DDPM code if I understand it correctly.

egeozsoy commented 2 years ago

Taken from the GLIDE paper: "Most previous work that uses diffusion models for inpaint- ing has not trained diffusion models explicitly for this task (Sohl-Dickstein et al., 2015; Song et al., 2020b; Meng et al., 2021). In particular, diffusion model inpainting can be performed by sampling from the diffusion model as usual, but replacing the known region of the image with a sample from q(xt|x0) after each sampling step." So maybe a short-term solution could be to adapt the sampling logic to allow inpainting in this fashion?

lucidrains commented 2 years ago

yea, depending on some circumstances next week, i could build this, let's leave this open

lucidrains commented 2 years ago

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

yup, this is the most ideal case :)

Mut1nyJD commented 2 years ago

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

yup, this is the most ideal case :)

Alternatively can you not just finetune the generation model for inpainting you simply would have just to change the input layer the rest of the weights in the network you should be able to take over.

lucidrains commented 2 years ago

i think i'm going to aim for integrating this technique https://github.com/andreas128/RePaint it is a pretty recent paper, but the results look good. can use this resampler technique for both dalle2 and imagen

lucidrains commented 2 years ago

ok it is done https://github.com/lucidrains/dalle2-pytorch#inpainting