lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Classifier-Free Guidance for Image Embed Conditioning #144

Closed xvjiarui closed 2 years ago

xvjiarui commented 2 years ago

Hi @lucidrains

In your implementation here, the image embedding is added to the time embedding as Sec. 2.1 in paper

projecting and adding CLIP embeddings to the existing timestep embedding

However, I am wondering whether the current implementation is still classifier-free guidance. Currently, we don't drop image_hiddens randomly, so it is always conditioned on image embedding, I think?

Please feel free to point me out if I am wrong.

lucidrains commented 2 years ago

@xvjiarui oh gosh, thank you for catching this bug! :man_facepalming: :pray: i've fixed it at https://github.com/lucidrains/DALLE2-pytorch/commit/5d958713c0753922b64086b2310324c1f3be64e5 hope this is the last one..

xvjiarui commented 2 years ago

Wow, that's fast! Thank you!