Closed buaapengbo closed 5 years ago
The feature extraction system and d-vector system at Google are proprietary code, and cannot be open-sourced. You need to either find a third-party implementation, or use your own implementation. This repo is dedicated to the UIS-RNN system.
Hi, thank you for open source it !
I read your paper and tests/integration_test.py , my question is that I want to know the way you use, to embedding the audio stream data with D = 512. Actually it's like the question here The way you generate train data or test data from a audio stream.
Is that like say:
librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40)
? In your paper,In this system, audio signals are first transformed into frames of width 25ms and step 10ms, and log-mel-filterbank energies of dimension 40 are extracted from each frame as the network input. These frames form overlapping sliding windows of a fixed length, on which we run the LSTM network. The last-frame output of the LSTM is then used as the d-vector representation of this sliding window
How can I reproduce this part ~I appreciate it, waiting for your response! Thanks, Bo