The Ability of Speaker Diarization with More Than 2 Speakers

resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning

Apache License 2.0

2.66k stars 419 forks source link

The Ability of Speaker Diarization with More Than 2 Speakers #72

Open ConnieZi opened 2 years ago

ConnieZi commented 2 years ago

Hello developers! Thank you so much for developing Resemblyzer and it is an amazing tool for me. I have actually been encountered a problems while developing, that when my input audio contains 3 speakers' untterances, the output would only give me the labels with only 2 speakers. More specifically, after the first(label 0) and the second speaker(label 1) finish talking and then the third speaker starts, Resemblyzer just labels the third speaker with label 0. Not sure if it's because the two of them have similar voices, but this happens with every audio of mine that contains 3 speakers.

theashishbhatt commented 2 years ago

Hey @ConnieZi. Did you check how many clusters you are creating? The SpectralClusterer takes in arguments for minimum and maximum clusters you want to be created. If you have three speakers in the audio, try setting min=3. It should work.

Nirannoel commented 1 year ago

Can i know how to get the output on text format like RTTM by using speaker diarization on resemblyzer

lkaniak commented 8 months ago

Hey @ConnieZi. Did you check how many clusters you are creating? The SpectralClusterer takes in arguments for minimum and maximum clusters you want to be created. If you have three speakers in the audio, try setting min=3. It should work.

Is it possible to match a use-case where I only need to identify the number of speakers in a audio sample, using this and the package?