wq2012 / SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
https://google.github.io/speaker-id/publications/LstmDiarization/
Apache License 2.0
513 stars 73 forks source link

LinAlgError: Array must not contain infs or NaNs #22

Closed anon747 closed 3 years ago

anon747 commented 3 years ago

Hello!

I trying to use the spectral-clusterer algorithm that this repository implements.

I have performed the following steps:-

  1. Load audio file
  2. Obtain windows corresponding to 25ms and the time difference between the starts of successive windows being 10ms.
  3. I then obtained the log-mel-filterbank-energies of dimension 40.
  4. At this stage, I have a (n_samples, 40) dimensional numpy array.
  5. I then run this through a 3-layer LSTM (as described in the paper) and finally have an array of dimensions (n_samples, 256).
  6. I then L2-normalized each sample.

However, once I try to run the spectral clustered on the final L2-normalized numpy array (dimensions: (n_samples, 256)), I get the following error:-

LinAlgError Traceback (most recent call last)

in () 10 gaussian_blur_sigma=1) 11 ---> 12 labels = clusterer.predict(X_l2) 3 frames /usr/local/lib/python3.7/dist-packages/spectralcluster/spectral_clusterer.py in predict(self, X) 117 # Perform eigen decomposion. 118 (eigenvalues, eigenvectors) = utils.compute_sorted_eigenvectors( --> 119 affinity) 120 # Get number of clusters. 121 k = utils.compute_number_of_clusters( /usr/local/lib/python3.7/dist-packages/spectralcluster/utils.py in compute_sorted_eigenvectors(A) 40 """ 41 # Eigen decomposition. ---> 42 eigenvalues, eigenvectors = np.linalg.eig(A) 43 eigenvalues = eigenvalues.real 44 eigenvectors = eigenvectors.real <__array_function__ internals> in eig(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/numpy/linalg/linalg.py in eig(a) 1316 """ 1317 Return the eigenvalues and eigenvectors of a complex Hermitian -> 1318 (conjugate symmetric) or a real symmetric matrix. 1319 1320 Returns two objects, a 1-D array containing the eigenvalues of `a`, and /usr/local/lib/python3.7/dist-packages/numpy/linalg/linalg.py in _assert_finite(*arrays) 207 'at least two-dimensional' % a.ndim) 208 --> 209 def _assert_stacked_square(*arrays): 210 for a in arrays: 211 m, n = a.shape[-2:] LinAlgError: Array must not contain infs or NaNs **Please note that my input array 'X_l2' of dimensions (n_samples,256) DOES NOT contain any nan or inf values.** My input however does contain samples where all values are 0. For example, a sample might have all 256 entries as 0. Is it the cause of my problem? Any help would be greatly appreciated. Thank you! :)
anon747 commented 3 years ago

Update:-

I tried by removing all-0s samples from the spectral clustered input but I still get the same error.

wq2012 commented 3 years ago

It fails at eigen decomposition. You can check what your similarity matrix (after refinement) looks like. Is it symmetric? It is a real matrix? What is its rank?