This Repository contains the code for estimating the Age, Height and Gender of a speaker with their speech signal. The repository experiments with both TIMIT and NISP Dataset.
Use the package manager pip to install the required packages for preparing the dataset, training and testing the model.
pip install -r requirements.txt
# Timit Dataset
wget https://data.deepai.org/timit.zip
unzip timit.zip -d 'path to timit data folder'
# NISP Dataset
git clone https://github.com/iiscleap/NISP-Dataset.git
# TIMIT Dataset
python TIMIT/prepare_timit_data.py --path='path to timit data folder'
# NISP Dataset
python NISP/prepare_nisp_data.pt --nisp_repo_path='path to nisp data repo folder'
Update the config.py file to update the batchsize, gpus, lr, etc and change the preferred logger in train.py files
# TIMIT Dataset
python train_timit.py --dev=True --data_path='path to final data folder'
# NISP Dataset
python train_nisp.py --dev=True --data_path='path to final data folder'
# TIMIT Dataset
python train_timit.py --data_path='path to final data folder'
# NISP Dataset
python train_nisp.py --data_path='path to final data folder'
# TIMIT Dataset
python test_timit.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'
# NISP Dataset
python test_nisp.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'
Model | Height RMSE | Height MAE | Age RMSE | Age MAE | Gender Acc | ||||
---|---|---|---|---|---|---|---|---|---|
Male | Female | Male | Female | Male | Female | Male | Female | ||
MFCC_LSTM-Attn | 7.5 | 6.6 | 5.5 | 5.2 | 7.7 | 8.4 | 5.6 | 5.9 | 0.975 |
MelSpec_LSTM-Attn | 7.7 | 8.1 | 5.8 | 6.5 | 7.7 | 8.7 | 5.5 | 6.1 | 0.669 |
MFCC_CNN-LSTM-Attn | 7.5 | 6.8 | 5.7 | 5.3 | 8.2 | 8.7 | 5.4 | 6.1 | 0.989 |
MelSpec_CNN-LSTM-Attn | 7.5 | 7.4 | 5.8 | 5.8 | 8.2 | 8.4 | 5.8 | 5.9 | 0.96 |
wav2vec(no-finetune)-LSTM-Attn | 7.4 | 6.4 | 5.5 | 5.1 | 7.2 | 8.2 | 5.0 | 5.7 | 0.994 |
wav2vec(finetune 56)-LSTM-Attn | 7.5 | 6.2 | 5.5 | 4.9 | 7.5 | 7.9 | 5.5 | 5.7 | 0.994 |
wav2vec(finetune 6)-LSTM-Attn | 7.6 | 6.7 | 5.6 | 5.3 | 7.0 | 8.2 | 4.9 | 5.6 | 0.993 |
wav2vec(finetune 56)-LSTM-Attn(Only H) | 7.4 | 6.2 | 5.6 | 4.9 | |||||
multi-scale-cnn(Only H) | 7.5 | 6.1 | 5.9 | 4.7 |
Model | Height | RMSE | Height | MAE | Age | RMSE | Age | MAE | Gender Acc |
---|---|---|---|---|---|---|---|---|---|
Male | Female | Male | Female | Male | Female | Male | Female | ||
[1] 2019 | 6.85 | 6.29 | - | - | 7.6 | 8.63 | - | - | |
[2] 2016 (fusion) | 6.7 | 6.1 | 5.0 | 5.0 | 7.8 | 8.9 | 5.5 | 6.5 | |
[2] 2016 (baseline) | 7.0 | 6.5 | 5.3 | 5.2 | 8.1 | 9.1 | 5.7 | 6.2 | |
[3] 2020 | - | - | - | - | 7.24 | 8.12 | 5.12 | 5.29 | 0.996 |
[4] 2009 | 6.8 | 6.3 | 5.3 | 5.1 | - | - | - | - |
Model | Height | RMSE | Height | MAE | Age | RMSE | Age | MAE | Gender Acc |
---|---|---|---|---|---|---|---|---|---|
Male | Female | Male | Female | Male | Female | Male | Female | ||
[5] TMP | 6.17 | 6.93 | 5.22 | 5.30 | 5.60 | 5.57 | 4.40 | 4.42 | |
[5] Comb-3 | 6.13 | 6.70 | 5.16 | 5.30 | 5.63 | 4.99 | 3.80 | 3.76 | |
Our Method | 6.49 | 6.37 | 5.32 | 5.12 | 5.48 | 5.71 | 3.70 | 4.22 | 0.984 |
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.