jymsuper / SpeakerRecognition_tutorial

Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
MIT License
210 stars 46 forks source link

Features Computation #6

Closed helia95 closed 4 years ago

helia95 commented 4 years ago

Hello, thanks for this great tutorial! I'm not able to reproduce the feature extraction step, can you please point me to the right direction?

Now I'm using logfbanks from python_speech_features library, with sr=16000, n_filters=40.

Many thanks!

jymsuper commented 4 years ago

Hi. Thank you for your interest in my work. I think that's because you do not normalize the input features. You can use the code below:

audio, sr = librosa.load(filename, sr=sample_rate, mono=True) filter_banks, energies = fbank(audio, samplerate=sample_rate, nfilt=40, winlen=0.025) filter_banks = 20 * np.log10(np.maximum(filter_banks,1e-5)) feature = normalize_frames(filter_banks, Scale=False)

Here, the function "normalize_frames" is as below:

def normalize_frames(m,Scale=True): if Scale: return (m - np.mean(m, axis=0)) / (np.std(m, axis=0) + 2e-12) else: return (m - np.mean(m, axis=0))