leitro / LabelAdaptiveMixup-SER

MIT License
6 stars 0 forks source link

License: MIT Python 3.8 PyTorch 1.12

LEARNING ROBUST SELF-ATTENTION FEATURES FOR SPEECH EMOTION RECOGNITION WITH LABEL-ADAPTIVE MIXUP

Lei Kang, Lichao Zhang, Dazhi Jiang.

Accepted to ICASSP 2023.

Hardware and Software:

Dataset

IEMOCAP

To make our results comparable to the state-of-the-art works [2, 3, 18], we merge ”excited” into ”happy” category and use speech data from four categories of ”angry”, ”happy”, ”sad” and ”neutral”, which leads to a 5531 acoustic utterances in total from 5 sessions and 10 speakers. The widely used Leave-One-Session-Out (LOSO) 5-fold cross-validation is utilized to report our final results. Thus, at each fold, 8 speakers in 4 sessions are used for training while the other 2 speakers in 1 session are used for testing.

Train the model

Architecture of the proposed method

arch

Comparison with state of the arts

res

Citation

If you are using the code or benchmarks in your research, please cite our paper:

Lei Kang, Lichao Zhang, Dazhi Jiang. "Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, Jun 2023.