openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

CLIP Recognition Error #416

Open nhw649 opened 8 months ago

nhw649 commented 8 months ago

CLIP will recognize this image as a hot dog with a very high probability close to 1, but the actual label should be a person. Is there a solution? 123

Rijgersberg commented 8 months ago

This is a classic (accidental) typographic attack.

iPod Apple attack

See the Fallacies of abstraction section of OpenAI - Multimodal neurons in artificial neural networks.

nhw649 commented 8 months ago

This is a classic (accidental) typographic attack.

iPod Apple attack

See the Fallacies of abstraction section of OpenAI - Multimodal neurons in artificial neural networks.

how to solve without extra training?