Features Computation - Githubissues

jymsuper / SpeakerRecognition_tutorial

Simple d-vector based Speaker Recognition (verification and identification) using Pytorch

MIT License

210 stars 46 forks source link

Hi. Thank you for your interest in my work. I think that's because you do not normalize the input features. You can use the code below:

audio, sr = librosa.load(filename, sr=sample_rate, mono=True) filter_banks, energies = fbank(audio, samplerate=sample_rate, nfilt=40, winlen=0.025) filter_banks = 20 * np.log10(np.maximum(filter_banks,1e-5)) feature = normalize_frames(filter_banks, Scale=False)

Here, the function "normalize_frames" is as below:

def normalize_frames(m,Scale=True): if Scale: return (m - np.mean(m, axis=0)) / (np.std(m, axis=0) + 2e-12) else: return (m - np.mean(m, axis=0))

jymsuper / SpeakerRecognition_tutorial

Features Computation #6