samsad35 / VQ-MAE-S-code

A Vector Quantized Masked AutoEncoder for speech emotion recognition
https://ieeexplore.ieee.org/document/10193151
GNU Affero General Public License v3.0
13 stars 1 forks source link
emotion-recognition masked-autoencoder self-supervised-learning speech vector-quantization

A vector quantized masked autoencoder for speech emotion recognition

Generic badge made-with-python Website shields.io

This repository contains the code associated with the following publication:

A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok, Simon Leglaive, Renaud Séguier
IEEE ICASSP 2023 Workshop on Self-Supervision in Audio, Speech and Beyond (SASB).

If you use this code for your research, please cite the above paper.

Useful links:

Setup

Usage

1) Training Speech VQ-VAE in unsupervised learning

VQ-VAE

if name == 'main': main()


* **data_train**: You need to specify the path to the data as well as the path to the H5 file where the spectrograms are previously stored. 
* **vqvae**: The model must be initialized with the parameters in "config_vqvae".
* **train_vqvae**: Initiate the training class with model, data and parameters in "config_vqvae", then launch it with .fit().

- You can download our pre-trained speech VQ-VAE [following link](checkpoint/SPEECH_VQVAE).
```python
vqvae.load(path_model=r"checkpoint/SPEECH_VQVAE/2022-12-27/21-42/model_checkpoint")

2) Training VQ-MAE-Speech in self-supervised learning

VQ-MAE

from vqmae import MAE, MAE_Train, SpeechVQVAE, VoxcelebSequential
import hydra
from omegaconf import DictConfig
import os

You can resume training from a backup by uncommenting the .load line.

    """ Training """
    pretrain_vqvae = MAE_Train(mae,
                               vqvae,
                               data_train,
                               data_validation,
                               config_training=cfg.train,
                               tube_bool=True,
                               follow=True,
                               multigpu_bool=True
                               )
    # pretrain_vqvae.load(path="checkpoint/RSMAE/2023-2-1/11-4/model_checkpoint")
    pretrain_vqvae.fit()

if __name__ == '__main__':
    main()

Pretrained models (released soon)

Model Masking strategy Masking ratio (%)
VQ-MAE-Speech Frame-based masking [50] - [60] - [70] - 80 - [90]
VQ-MAE-Speech Patch-based masking [50] - 60 - 70 - 80 - 90
Model Encoder depth
VQ-MAE-Speech [6] - [12] - [16] - [20]

3) Fine-tuning and classification for emotion recognition task



## License
GNU Affero General Public License (version 3), see LICENSE.txt.