openai / glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model
MIT License
3.53k stars 500 forks source link

About CLIP training on nosied images #44

Open yufeng9819 opened 1 year ago

yufeng9819 commented 1 year ago

Hey! I think GLIDE is a wonderful work. But I have a question about CLIP training on nosied images.

I want to know why CLIP can be trained on nosied images. I think if t (range from 0 to 1000) is large(maybe close to 500 or more), then the noised images hardly contain any semantic information. In this case, I want to know CLIP model how to encode similar features from noised images and text and I also think it may cause model to not converge (because it is hard to encode similar features between noised images and text)