Closed ckobus closed 2 years ago
Purity is computed on the temporal support common to reference
and hypothesis
.
In your case, this means that interval [2, 3] is excluded from the computation (since it is not covered by reference
).
To obtain the behavior that you expected, you'd have to fill the [2, 3] gap in reference
by a fake non_speech
segment:
reference[Segment(2, 3)] = "non_speech"
First, thank you for this open-source project!
I start looking at speaker change detection algorithms and discovered this open-source project. As I am a newbie in this field, I am still struggling understanding which measure to use to evaluate a speaker change detection module. The coverage and purity measure are well explained in this page https://pyannote.github.io/pyannote-metrics/reference.html
I had a look at a previous issue from someone mentioning he always gets a purity of 100% even though its system is not perfect. Someone replied he should rather use DiarizationPurity and DiarizationCoverage for a speaker change detection task, which is the task I want to perform.
I tried them on a toy example :
from pyannote.core import Annotation, Segment from pyannote.metrics.diarization import DiarizationPurity, DiarizationCoverage purity = DiarizationPurity() coverage = DiarizationCoverage() reference = Annotation() reference[Segment(1, 2)] = "a" reference[Segment(3, 5)] = "b" hypothesis = Annotation() hypothesis[Segment(1, 5)] = "A"
I get a purity of 66,66% where as I would expect a purity of 50% (for the hypothesis segment A, the most covering segment is the segment b with an overlap of 2 => purity = 2/4=0.5
Could you explain where I am wrong in my understanding? And tell me how I should use those metrics?
Thank you in advance for the advises/explanations!