resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.66k stars 419 forks source link

Diarization Graph Issue #75

Open sankalpbhatia20 opened 1 year ago

sankalpbhatia20 commented 1 year ago

Hey Developers!

I was running the diarization (demo2) code to get line graphs for different speakers in a recording with the "similarity" variable on the Y axis.

However, could you help me understand how the graph is being plotted even for the other speaker when he is not even speaking at that particular time.

Your help will be appreciated. Thanks.

nehat005 commented 1 year ago

From what I understand:

The audio wav is broken down into chunks (called wav splits), and for each of these splits an embedding is obtained, which are then compared with speaker embeddings (which you get from providing speech excerpts of each speaker beforehand). So, say you have 2 speakers in audio: for each wav split you get an embedding which is compared with each of the 2 speaker embeddings. So you get 2 similarity scores. This is why you get 2 lines (even of the speaker who is not speaking) over time.