Lei Kang, Lichao Zhang, Dazhi Jiang.
Accepted to ICASSP 2023.
i9-10900
64GB RAM
RTX3090 (24GB)
Ubuntu 22.04
Python 3.8
PyTorch 1.12
To make our results comparable to the state-of-the-art works [2, 3, 18], we merge ”excited” into ”happy” category and use speech data from four categories of ”angry”, ”happy”, ”sad” and ”neutral”, which leads to a 5531 acoustic utterances in total from 5 sessions and 10 speakers. The widely used Leave-One-Session-Out (LOSO) 5-fold cross-validation is utilized to report our final results. Thus, at each fold, 8 speakers in 4 sessions are used for training while the other 2 speakers in 1 session are used for testing.
dataset_wavMix.py
.python train.py
, note that the training information will be printed out once per epoch.If you are using the code or benchmarks in your research, please cite our paper:
Lei Kang, Lichao Zhang, Dazhi Jiang. "Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, Jun 2023.