google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

How to embedding audio stream data to k-vector (512) #9

Closed buaapengbo closed 5 years ago

buaapengbo commented 5 years ago

Hi, thank you for open source it !

I read your paper and tests/integration_test.py , my question is that I want to know the way you use, to embedding the audio stream data with D = 512. Actually it's like the question here The way you generate train data or test data from a audio stream.

Is that like librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40) ? In your paper, say: In this system, audio signals are first transformed into frames of width 25ms and step 10ms, and log-mel-filterbank energies of dimension 40 are extracted from each frame as the network input. These frames form overlapping sliding windows of a fixed length, on which we run the LSTM network. The last-frame output of the LSTM is then used as the d-vector representation of this sliding window How can I reproduce this part ~

I appreciate it, waiting for your response! Thanks, Bo

wq2012 commented 5 years ago

The feature extraction system and d-vector system at Google are proprietary code, and cannot be open-sourced. You need to either find a third-party implementation, or use your own implementation. This repo is dedicated to the UIS-RNN system.