wq2012 / SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
https://google.github.io/speaker-id/publications/LstmDiarization/
Apache License 2.0
513 stars 73 forks source link

UnboundLocalError: local variable 'best_p_percentile_index' referenced before assignment #38

Closed hbredin closed 2 years ago

hbredin commented 2 years ago

FYI, while running your pyannote.audio PR, I got the following error that I have yet to narrow down.

  File "../site-packages/spectralcluster/spectral_clusterer.py", line 181, in predict
    eigenvectors, n_clusters, _ = self.autotune.tune(p_percentile_to_ratio)
  File "../site-packages/spectralcluster/autotune.py", line 96, in tune
    start_index = max(0, best_p_percentile_index - local_search_dist)
UnboundLocalError: local variable 'best_p_percentile_index' referenced before assignment

https://github.com/wq2012/SpectralCluster/blob/06504bd15890d21380cab0ff9b80cd437d1402a1/spectralcluster/autotune.py#L96

It probably is because the pipeline tries to cluster a small (like less than 2) number of embeddings but I will confirm this once I know more.

In the meantime, maybe it is obvious to you why this might happen?

hbredin commented 2 years ago

I can confirm that this happens when trying to cluster

wq2012 commented 2 years ago

I see. This is because this spectral clustering implementation does not expect too few embeddings or min_clusters = 1.

In our internal implementation, we only use spectral clustering when the speaker turn detection model has detected a speaker turn, and there are at least X embeddings.

I can implement a fallback logic here. If the number of embeddings is smaller than X, we use a different clustering algorithm (e.g. we used Naive clustering, here we could use hierarchical).

wq2012 commented 2 years ago

Fixed. Now you can pass in fallback_options as an additional argument.

hbredin commented 2 years ago

🎉 Thanks!