lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Poor text-conditioning performance on larger models #101

Closed nousr closed 2 years ago

nousr commented 2 years ago

Hey @lucidrains I'm observing poor performance when text-conditioning on larger model architectures.

https://wandb.ai/nousr_laion/diffusion-prior/reports/Text-Conditioning-Results--VmlldzoyMDI1OTE2?accessToken=3m0abrrq46cfveu9pp63zkvm62sbfhvdbsek6db09fh70g0deuc3a7qi4gxtdj2u

This report shows some early results using small and medium sized prior architectures.

In short, the smaller models (both text-conditioned and embedding-only) will actually improve in cosine similarity...but larger text-conditioned models immediately plateau. In other words, it seems like large text-conditioned models have ~some~ issue.

Any idea what could be causing this? @rom1504 theorized that the in the larger architectures

the model tries to ignore the text, it succeeds when the model is small, so the result is good It fails when the model is large so the result is bad

Any ideas?

lucidrains commented 2 years ago

@nousr no idea, you are on your own :) sorry, won't have much time to devote to DALLE2 remainder of this week (job interviews, etc). i would touch base with Katherine and see if she has any experiments with and without text encodings at comparable model sizes. another thing to check is whether the text encodings truly came from the last hidden layer of the text transformer, or simply taken when the positional embedding was added to the token embeddings

nousr commented 2 years ago

@lucidrains good luck on the interviews! I'll dig around and see what I can find 🕵️

nousr commented 2 years ago

super embarrassing...i think i just forgot to git pull on the gpu-pod before starting those larger training runs, and then when I did the smaller tests I had already pulled while preparing for something else...tried it again and it 🤦‍♂️ . works ,

f a c e p a l m