lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.06k stars 1.08k forks source link

Regarding classifier-free guidance in diffusion prior. #235

Closed jaykim9870 closed 2 years ago

jaykim9870 commented 2 years ago

Hi, I am looking into your code and it seems like you implemented classifier-free guidance in diffusion prior as below.

https://github.com/lucidrains/DALLE2-pytorch/blob/916ece164c2d48b7f8cf1e6c832ff249adf979c8/dalle2_pytorch/dalle2_pytorch.py#L1055-L1096

The thing is, the mask doesn't affect input token, and in this way, classifier-free guidance would not applied to the diffusion prior. Please let me know if I am missing something.

Also, as far as I know, the probability of dropping CLIP embedding is different from image and text embedding.

That is, when training the diffusion prior, 10% of the text embedding is dropped out (Sec 2.2) and 5% of the image embedding is dropped out (Sec 5.1).

But it seems that the probability of dropping is always fixed to cond_drop_prob, so could you check if I am understanding the code correctly?

lucidrains commented 2 years ago

@jaykim9870 Hi Jay! Thanks for raising this issue

Indeed there was an issue where the mask was not even used

I've refactored it to use the null tokens strategy in v1.9.0, and would definitely welcome a second review! https://github.com/lucidrains/DALLE2-pytorch/commit/59fa101c4d21843477a5202560a877328ad47afd

jaykim9870 commented 2 years ago

I just checked the new version and it seems great. Thanks for your quick modification, it helped me a lot!