Open a43992899 opened 2 years ago
The glide-text2sim model that is better than clip_guided model is the one trained with a large private dataset and it is not released. Whereas, this released model is trained on a filtered and less diverse dataset.
Exactly. A full explanation of the differences is here: https://github.com/openai/glide-text2im/issues/21#issuecomment-1045590329
I have tested the glide model for a few days (I tried many kinds of prompts), and my result is that clip_guided works better than classifier-free text2im.
clip_guided can correctly follow the intention of my prompt, like "a boat on the top of the mountain", or "Pablo Picasso: Into the wind", and text2im failed to do that.
However the paper claims that classifier-free text2im > clip_guided. I wonder why? Is there anything wrong with the released model?