resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.79k stars 429 forks source link

How to get time stamps for every speaker change? #77

Open nehat005 opened 2 years ago

nehat005 commented 2 years ago

For the purpose of evaluating speaker diarization, I am trying to integrate DER evaluation script which requires time stamps of each active speaker speaking from start_time to end_time (as per RTTM format).

As per my experiments: For an audio of 60 seconds, I get 75 similarities (depending on the number of partial utterances (also called wav splits)), but there is an overlap between each of the partial utterances. Say 2 partial utterances in overlap have different speaker labels (one has speaker_A and other has speaker_B (from the similarity matrix)), how do we then get the exact time stamp (in seconds) where the speaker change has occurred?

Kindly request any help :)