Closed seungwonpark closed 5 years ago
We used PLP features for VAD. VAD is trained with fundamental acoustic features, not speaker embeddings.
We used a pretty simple VAD only because we are using the same VAD for multiple datasets. It's not the optimal setup. You can definitely train your own VAD using other features for the domain that you focus on.
Describe the question
Hi, thanks for open-sourcing this awesome project. Which feature was used for VAD? d-vector or PLP features (as you mentioned in "Speaker Diarization With LSTM") ?
My background
Have I read the
README.md
file? yes Have I searched for similar questions from closed issues? yes Have I tried to find the answers in the paper Fully Supervised Speaker Diarization? yes Have I tried to find the answers in the reference Speaker Diarization with LSTM? yes Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification? yes