Why clip_guided works better than text2im, inconsistent with the paper's claim?

openai / glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model

MIT License

3.53k stars 500 forks source link

Why clip_guided works better than text2im, inconsistent with the paper's claim? #19

Open a43992899 opened 2 years ago

a43992899 commented 2 years ago

I have tested the glide model for a few days (I tried many kinds of prompts), and my result is that clip_guided works better than classifier-free text2im.

clip_guided can correctly follow the intention of my prompt, like "a boat on the top of the mountain", or "Pablo Picasso: Into the wind", and text2im failed to do that.

However the paper claims that classifier-free text2im > clip_guided. I wonder why? Is there anything wrong with the released model?

thunanguyen commented 2 years ago

The glide-text2sim model that is better than clip_guided model is the one trained with a large private dataset and it is not released. Whereas, this released model is trained on a filtered and less diverse dataset.

Arcitec commented 2 years ago

Exactly. A full explanation of the differences is here: https://github.com/openai/glide-text2im/issues/21#issuecomment-1045590329