pyannote / pyannote-pipeline

Tunable pipelines
Other
28 stars 12 forks source link

ValueError: The number of observations cannot be determined on an empty distance matrix #20

Closed PaulLerner closed 4 years ago

PaulLerner commented 4 years ago

ValueError: The number of observations cannot be determined on an empty distance matrix

Hi, I'm getting this error from scipy, it relates to this line. This is happening when using the SpeechTurnClustering Pipeline with HierarchicalAgglomerativeClustering, as it compute linkage which itself computes pdist and scipy.cluster.hierarchy.linkage The error happens because when there's only one embedding in X (input of linkage), the distance matrix distance = pdist(X, metric=metric) is empty (i.e. has a shape of 0)

Full output:

Best = 73.3921%: : 1334iteration [3:52:11, 18.23s/iteration][W 2020-01-24 17:20:09,784] Setting status of trial#14948 as TrialState.FAIL because of the following error: ValueError('The number of observations cannot be determined on an empty distance matrix.',)
Traceback (most recent call last):
  File "/people/lerner/anaconda_m107/envs/m107/lib/python3.6/site-packages/optuna/study.py", line 505, in _run_trial
    result = func(trial)
  File "/people/lerner/pyannote/pyannote-pipeline/pyannote/pipeline/optimizer.py", line 174, in objective
    output = pipeline(input)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/pipeline/speaker_diarization.py", line 153, in __call__
    long_speech_turns)
  File "/people/lerner/pyannote/pyannote-audio/pyannote/audio/pipeline/speech_turn_clustering.py", line 127, in __call__
    clusters = self.clustering(np.vstack(X))
  File "/people/lerner/pyannote/pyannote-pipeline/pyannote/pipeline/blocks/clustering.py", line 112, in __call__
    Z = linkage(X, method=self.method, metric=self.metric)
  File "/people/lerner/anaconda_m107/envs/m107/lib/python3.6/site-packages/pyannote/core/utils/hierarchy.py", line 64, in linkage
    metric=metric)
  File "/people/lerner/anaconda_m107/envs/m107/lib/python3.6/site-packages/scipy/cluster/hierarchy.py", line 1064, in linkage
    n = int(distance.num_obs_y(y))
  File "/people/lerner/anaconda_m107/envs/m107/lib/python3.6/site-packages/scipy/spatial/distance.py", line 2403, in num_obs_y
    raise ValueError("The number of observations cannot be determined on "
ValueError: The number of observations cannot be determined on an empty distance matrix
hbredin commented 4 years ago

This is a bug in pyannote.pipeline.blocks.clustering (I transfered the issue).

Before calling linkage, __call__ should check whether there is strictly more than one element to cluster. It should also handle the corner case where there is exactly one element (i.e. return trivial clustering result without calling linkage).

I'd happily merge a pull request fixing both HierarchicalAgglomerativeClustering and AffinityPropagationClustering