openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

Enquiry about the FER2013 dataset used in CLIP paper #439

Open hyy-2000 opened 4 months ago

hyy-2000 commented 4 months ago

Hi authors, Thank you for your great work! When I was doing some exploratory work around CLIP model, I notices Table 9 of the paper demonstrates that Facial Emotion Recognition 2013 dataset has 8 classes, image which is contradictory to the description of Facial Emotion Recognition 2013 dataset on kaggle.

The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

So I am curious about how the Facial Emotion Recognition 2013 dataset is used in the paper. Is there a new class added to the original dataset? If so, would you please share the inplementatino details and possible reasons?

Many thanks, Yuanyang