Clip's capablity of detecting scene or background information

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

MIT License

24.7k stars 3.21k forks source link

how is clip performing on global information detection? For example, finding whether an image is noisy-corrupted, downsample-d or hazy, and furthermore, choosing the right corruption parameters like noise std? I tried images with different types of noises like gaussian poisson or gamma, and other corruptions like downsampling or hazy, and tokens like [gaussian noise with std=25, gaussian noise with std=50], [noisy, hazy], but the inference result is not well. Am i missing any key parts on my way of testing?

openai / CLIP

Clip's capablity of detecting scene or background information #389