jymsuper / SpeakerRecognition_tutorial

Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
MIT License
210 stars 46 forks source link
deep-learning pytorch speaker-identification speaker-recognition speaker-verification

SpeakerRecognition_tutorial

Requirements

python 3.5+
pytorch 1.0.0
pandas 0.23.4
numpy 1.13.3
pickle 4.0
matplotlib 2.1.0

Datasets

We used the dataset collected through the following task.

Specification

We uploaded 40-dimensional log Mel-filterbank energy features extracted from the above dataset.
python_speech_features library is used.

You can extract the features using the code below:

import librosa
import numpy as np
from python_speech_features import fbank

def normalize_frames(m,Scale=True):
    if Scale:
        return (m - np.mean(m, axis=0)) / (np.std(m, axis=0) + 2e-12)
    else:
        return (m - np.mean(m, axis=0))

audio, sr = librosa.load(filename, sr=sample_rate, mono=True)
filter_banks, energies = fbank(audio, samplerate=sample_rate, nfilt=40, winlen=0.025)
filter_banks = 20 * np.log10(np.maximum(filter_banks,1e-5))
feature = normalize_frames(filter_banks, Scale=False)

1. Train

24000 utterances, 240 folders (240 speakers)
Size : 3GB
feat_logfbank_nfilt40 - train

2. Enroll & test

20 utterances, 10 folders (10 speakers)
Size : 11MB
feat_logfbank_nfilt40 - test

Usage

1. Training

Background model (ResNet based speaker classifier) is trained.
You can change settings for training in 'train.py' file.

python train.py

2. Enrollment

Extract the speaker embeddings (d-vectors) using 10 enrollment speech files.
They are extracted from the last hidden layer of the background model.
All the embeddings are saved in 'enroll_embeddings' folder.

python enroll.py

3. Testing

For speaker verification, you can change settings in 'verification.py' file.

python verification.py

For speaker identification, you can change settings in 'identification.py' file.

python identification.py

How to train using your own dataset

1. Modify the line 21 in train.py

train_DB, valid_DB = split_train_dev(c.TRAIN_FEAT_DIR, val_ratio)

2. Modify the line 31 in DB_wav_reader.py

def find_feats(directory, pattern='**/*.p'):
    """Recursively finds all files matching the pattern."""
    return glob(os.path.join(directory, pattern), recursive=True)

3. Change the line 12 in SR_Dataset.py

def read_MFB(filename):
    with open(filename, 'rb') as f:
        feat_and_label = pickle.load(f)

    feature = feat_and_label['feat'] # size : (n_frames, dim=40)
    label = feat_and_label['label']

4. Change other options

Author

Youngmoon Jung (dudans@kaist.ac.kr) at KAIST, South Korea