google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

How to create training data? #62

Closed chrisspen closed 4 years ago

chrisspen commented 4 years ago

Describe the question

How do you take raw audio file annotated with speaker labels and convert them into a form that can be used by uis-rnn? There's no documentation for creating your own training data from raw audio files. The toy training and test data appear to be numpy arrays, but there's no description of what these arrays represent.

My background

Have I read the README.md file?

Have I searched for similar questions from closed issues?

Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?

Have I tried to find the answers in the reference Speaker Diarization with LSTM?

Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?

wq2012 commented 4 years ago

https://www.youtube.com/watch?v=pGkqwRPzx9U&t=23m19s

You need to find another library to compute speaker embeddings.

008karan commented 4 years ago

@chrisspen have you found how to prepare training data for uisrnn. like format of data is required and all?