google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

Test modell #31

Closed monosakkarid closed 5 years ago

monosakkarid commented 5 years ago

Sorry if this sounds like a dumb question. I am not an expert in eighter python or speaker diarization. After I have trained the model, how can I use it to determine how is speaking from a wave file. I am trying to determine how is speaking from a one audio telephone conversation.

Could I for example use test_test_sequence=wavfile.read(mywav) as a input to predicted_cluster_id = model.predict(test_sequence, args), and get get a prediction of how spoke from this file?

My question is more about the use of the code. I hope you can help!

wq2012 commented 5 years ago
  1. The input must be speaker-discriminative embeddings computed using another library, not raw waveform signals.
  2. UIS-RNN must be trained before you can use it for prediction.
  3. Please read the paper, or at least the README.md file first.