google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.56k stars 319 forks source link

[Question] Which feature was used for VAD? #48

Closed seungwonpark closed 5 years ago

seungwonpark commented 5 years ago

Describe the question

Hi, thanks for open-sourcing this awesome project. Which feature was used for VAD? d-vector or PLP features (as you mentioned in "Speaker Diarization With LSTM") ?

My background

Have I read the README.md file? yes Have I searched for similar questions from closed issues? yes Have I tried to find the answers in the paper Fully Supervised Speaker Diarization? yes Have I tried to find the answers in the reference Speaker Diarization with LSTM? yes Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification? yes

wq2012 commented 5 years ago

We used PLP features for VAD. VAD is trained with fundamental acoustic features, not speaker embeddings.

We used a pretty simple VAD only because we are using the same VAD for multiple datasets. It's not the optimal setup. You can definitely train your own VAD using other features for the domain that you focus on.