pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
http://pyannote.github.io/pyannote-metrics
MIT License
183 stars 30 forks source link

Issue understanding the outputs of coverage and purity metrics #55

Closed ckobus closed 2 years ago

ckobus commented 2 years ago

First, thank you for this open-source project!

I start looking at speaker change detection algorithms and discovered this open-source project. As I am a newbie in this field, I am still struggling understanding which measure to use to evaluate a speaker change detection module. The coverage and purity measure are well explained in this page https://pyannote.github.io/pyannote-metrics/reference.html

I had a look at a previous issue from someone mentioning he always gets a purity of 100% even though its system is not perfect. Someone replied he should rather use DiarizationPurity and DiarizationCoverage for a speaker change detection task, which is the task I want to perform.

I tried them on a toy example :

from pyannote.core import Annotation, Segment from pyannote.metrics.diarization import DiarizationPurity, DiarizationCoverage purity = DiarizationPurity() coverage = DiarizationCoverage() reference = Annotation() reference[Segment(1, 2)] = "a" reference[Segment(3, 5)] = "b" hypothesis = Annotation() hypothesis[Segment(1, 5)] = "A"

I get a purity of 66,66% where as I would expect a purity of 50% (for the hypothesis segment A, the most covering segment is the segment b with an overlap of 2 => purity = 2/4=0.5

Could you explain where I am wrong in my understanding? And tell me how I should use those metrics?

Thank you in advance for the advises/explanations!

hbredin commented 2 years ago

Purity is computed on the temporal support common to reference and hypothesis. In your case, this means that interval [2, 3] is excluded from the computation (since it is not covered by reference). To obtain the behavior that you expected, you'd have to fill the [2, 3] gap in reference by a fake non_speech segment:

reference[Segment(2, 3)] = "non_speech"