Closed steveway closed 3 years ago
Hi, thanks for your suggestion!
Unfortunately, speaker diarization is a much different task from the current recognition task. There is no plan for us to add diarization model.
However, there are a couple of repo doing the diarization task, you can have a look at them here https://github.com/topics/speaker-diarization
I personally used the following one before, it performs well but requires some additional efforts to make it work. https://github.com/google/uis-rnn
I see, that makes sense, thank you. I'll experiment with integrating other tools like that then. I guess this Issue can then be closed.
Hello, As you can see here I've started integrating this project into Papagayo-NG: https://github.com/morevnaproject-org/papagayo-ng/issues/49 The first results from my tests seem to be very promising. Especially the new timestamp feature is helping a lot with that.
Is it possible to add some speaker separation to this? Papagayo-NG itself allows several speakers for one audio file. If we could recognize which parts are spoken by a separate speaker then that would make this a really nice solution for even more animators. I've taken a look at the topic, and it seems to be quite complex. If this could be integrated to Allosaurus then that would be awesome of course. If not there would be ways to get this into Papagayo-NG, we could do a separate pass over the audio. I've taken a look and pyAudioAnalysis seems to already do that. But that would be a big dependency addition.