pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
http://pyannote.github.io/pyannote-metrics
MIT License
183 stars 30 forks source link

Calls to `confidence_interval` fails on any metric when it has only "seen" one file #64

Closed hadware closed 2 years ago

hadware commented 2 years ago

Description

This is some sort of an edge case: when you validate on a Protocol (from pyannote.db) that has only one ProtocolFile, the pipeline tuning fails because of this error. This is a problem that probably affects all metrics in the package, since it's caused by some code in BaseMetric.

Steps/Code to Reproduce

from pyannote.core import Annotation, Segment
from pyannote.metrics.identification import IdentificationErrorRate
reference = Annotation(uri="4577")
reference[Segment(0, 10)] = 'A'
reference[Segment(12, 20)] = 'B'
reference[Segment(24, 27)] = 'A'
reference[Segment(30, 40)] = 'C'
hypothesis = Annotation(uri="4577")
hypothesis[Segment(2, 13)] = 'a'
hypothesis[Segment(13, 14)] = 'd'
hypothesis[Segment(14, 20)] = 'b'
hypothesis[Segment(22, 38)] = 'c'
hypothesis[Segment(38, 40)] = 'd'

ier = IdentificationErrorRate()
ier(reference, hypothesis)
try:
    # This will fail and raise an error
    ier.confidence_interval()
except:
    pass

# tricking the metric into thinking that there are two files
reference.uri = "3615"
hypothesis.uri = "3615"
ier(reference, hypothesis)
# This will work
ier.confidence_interval()

Error Stack

Traceback (most recent call last):
  File "/home/hadware/.config/JetBrains/PyCharm2022.1/scratches/scratch_1.py", line 22, in <module>
    ier.confidence_interval()
  File "/home/hadware/Code/pyannote/pyannote-audio/venv/lib/python3.8/site-packages/pyannote/metrics/base.py", line 308, in confidence_interval
    m, _, _ = scipy.stats.bayes_mvs(
  File "/home/hadware/Code/pyannote/pyannote-audio/venv/lib/python3.8/site-packages/scipy/stats/_morestats.py", line 127, in bayes_mvs
    m, v, s = mvsdist(data)
  File "/home/hadware/Code/pyannote/pyannote-audio/venv/lib/python3.8/site-packages/scipy/stats/_morestats.py", line 197, in mvsdist
    raise ValueError("Need at least 2 data-points.")
ValueError: Need at least 2 data-points.

Versions

pyannote.core==4.4
pyannote.database==4.1.3
pyannote.metrics==3.2
pyannote.pipeline==2.2
hbredin commented 2 years ago

Can you please double check that https://github.com/pyannote/pyannote-metrics/commit/2fc7349f156219b919b8bd4ce0b7f23714d61ba2 fixes the issue?

I will then make a new bugfix release.

hadware commented 2 years ago

Yup, it seems to be working fine :)

Closing the issue.

hbredin commented 2 years ago

I just release pyannote.metrics 3.2.1. Thanks!