zinengtang / TVLT

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
MIT License
120 stars 13 forks source link

In accurate test results for emotion classification #5

Closed Changezi001 closed 1 year ago

Changezi001 commented 1 year ago

Hi Zineng Tang,

Thank you for sharing the code and the released models. I really liked the idea/paper as I am working on a real-time emotion recognition project and I am excited to use TVLT in my project because of its fast inference time and lightweight architecture. But, when I tested it on short video clips having various emotions, the model fails to recognize emotion correctly. More specifically, the model produces Happy emotion almost 98% of the time, even for videos having surprise, anger, and fear emotions. Are you sure you have uploaded the correct "Pre-trained on Howto100m + Yttemporal videos, then finetuned on CMU-MOSEI emotional analysis" model on this page?

zinengtang commented 1 year ago

The finetuned model overfitted on CMU-MOSEI emotional analysis, which could have imbalanced distributions. It is recommended to finetune on a particular dataset for further use.

AIXiaoBaiDemon commented 1 year ago

I also ran into this issue, almost all videos will be judged happy, did you solve this problem?

Changezi001 commented 1 year ago

No, I have not yet solved this issue, I am waiting for @zinengtang to share the emotion label file to fine-tune the network on a balanced dataset, that is not biased toward happy emotions.

BDHU commented 1 year ago

@Changezi001 any updates on this? Were you using the labels under labels/BA by any chance?

Changezi001 commented 1 year ago

@BDHU I was just testing the pretrained model for emotion analysis that @zinengtang has shared with us. I am trying to retrain it with a balanced CMU-Mosei dataset, but I am getting nan loss value during training, as I have raised the issue here.