For the purpose of evaluating speaker diarization, I am trying to integrate DER evaluation script which requires time stamps of each active speaker speaking from start_time to end_time (as per RTTM format).
As per my experiments: For an audio of 60 seconds, I get 75 similarities (depending on the number of partial utterances (also called wav splits)), but there is an overlap between each of the partial utterances. Say 2 partial utterances in overlap have different speaker labels (one has speaker_A and other has speaker_B (from the similarity matrix)), how do we then get the exact time stamp (in seconds) where the speaker change has occurred?
For the purpose of evaluating speaker diarization, I am trying to integrate DER evaluation script which requires time stamps of each active speaker speaking from start_time to end_time (as per RTTM format).
As per my experiments: For an audio of 60 seconds, I get 75 similarities (depending on the number of partial utterances (also called wav splits)), but there is an overlap between each of the partial utterances. Say 2 partial utterances in overlap have different speaker labels (one has speaker_A and other has speaker_B (from the similarity matrix)), how do we then get the exact time stamp (in seconds) where the speaker change has occurred?
Kindly request any help :)