scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.5k stars 25.28k forks source link

LinAlgError in training spectral clustering model: leading minor not positive definite #6489

Closed cinvro closed 4 years ago

cinvro commented 8 years ago

I am trying to fit a spectral clustering model on a 50 X 50 symmetric adjacency matrix:

from sklearn.cluster import SpectralClustering
labels = SpectralClustering(n_clusters=5,
                          affinity="precomputed",
                          assign_labels="kmeans",
                          random_state=1991,
                          eigen_solver="lobpcg").fit_predict(X)

And I got following error:

 numpy.linalg.linalg.LinAlgError: the leading minor of order 15 of 'b' is not positive definite. 
 The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

I follow #2924 to change liblapack to atlas implementation, it doesn't work.

The matrix X is attached as aff.pkl in following zip file for whom attempt to reproduce this error:

aff.pkl.zip

or one can directly copy this verbose version:

import numpy as np
X=np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 0, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 0, 1, 1, 1],
   [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1]])
from sklearn.cluster import SpectralClustering
labels = SpectralClustering(n_clusters=5,
                      affinity="precomputed",
                      assign_labels="kmeans",
                      random_state=1991,
                      eigen_solver="lobpcg").fit_predict(X)

System info:

image

giorgiop commented 8 years ago

What is the reason why you set lobpcg as solver? arpack gives you a solution:

>>> SpectralClustering(n_clusters=5,
                      affinity="precomputed",
                      assign_labels="kmeans",
                      random_state=1991,
                      eigen_solver="arpack").fit_predict(X)
array([0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,
       1, 0, 0, 0], dtype=int32)

By a quick look, your matrix is very low rank:

>>> np.linalg.matrix_rank(X)
5

The graph is almost fully connected, with only a few edges missing. I did not investigate if this is the issue with lobpcg. You may take a look a the scipy doc.

cinvro commented 8 years ago

I run into same problem when using arpack, so I swiched to lobpcg. It seems at some point arpack calls lobpcg, I guess switching back to arpack cannot completely address this problem, maybe? Here is another example I have in my experiments which uses arpack, it produces the same error:

from sklearn.cluster import SpectralClustering
import pickle 
with open('x.pkl','rb') as f:
   X = pickle.load(f).transpose()    
   labels = SpectralClustering(n_clusters=5,
                      assign_labels="kmeans",
                      random_state=1991).fit_predict(X)
   f.close()

error messages:

numpy.linalg.linalg.LinAlgError: the leading minor of order 18 of 'b' is not positive definite.      
The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

data_matrix

This example trys to apply spectral clustering on a 80 X 100 matrix directly without precomputed affinity. And strangely, I cannot reproduce this error using my laptop which runs OS X instead of Ubunutu on my server. Maybe it is related to the version of liblobpcg?

@giorgiop Shouldn't spectral clustering module still work on low rank matrix?

lobpcg commented 6 years ago

That may be a bug in LOBPCG code. I'll try to investigate.

lobpcg commented 6 years ago

The bug is aparently in the spectral_embedding function - when calling lobpcg in all places, it ignores the value of eigen_tol and instead uses fixed predefined value tol=1e-15 . In the example above, this tolerance cannot be obtained by lobpcg, so rather then stopping after 2 iterations, as it should have, lobpcg keeps running and starts soon producing nonsense at the level of the round-off errors, and quickly, at iteration 4 or 5, crashes. The immediate simple solution is (1) to make sure that the eigen_tol value in SpectralClustering input actually propagates through all intermediary functions to lobpcg and being in fact used in the lobpcg function (2) change the default values from eigen_tol=0, e.g., in spectral.py, and advise the user to input "reasonable" tolerance, or none. The default tolerance set in lobpcg is n*sqrt(eps) = 1e-6 for the given example

massich commented 6 years ago

I can replicate the error, but I don't think that there's any problem with it but rather an ill-post example.

lobpcg commented 6 years ago

One can set in lobpcg the value verbosityLevel = 1 to see the screen output after every iteration. The most important output to check is residual norms. The lobpcg code stops when all residual norms are below the tolerance. In the example above the max residual norms drops below 1e-10 but then starts growing and soon crashes. The fact that the residual norms cannot get smaller than 1e-10 is, indeed, an indication of somewhat ill-posed example. However, this is still not a good enough excuse for the code to crash. And it would not crash and give good results just by changing the tolerance to 1e-10 or larger. Here is a sample output with the built-in tol = 1e-15 and setting verbosityLevel = 1

Solving generalized eigenvalue problem with preconditioning

matrix size 50 block size 6

No constraints

iteration 0 current block size: 5 eigenvalue: [4.10052890e-16 1.01537911e+00 1.01636536e+00 1.01904567e+00 1.02052286e+00 1.02154829e+00] residual norms: [2.13133519e-16 1.41923167e-02 2.27834607e-02 1.25036013e-02 2.16503765e-03 1.46368966e-02] iteration 1 current block size: 5 eigenvalue: [4.10052890e-16 9.76745287e-01 9.76803141e-01 9.77542806e-01 1.02040816e+00 1.02040816e+00] residual norms: [2.25673583e-16 2.26269818e-04 1.65288398e-03 6.04664762e-03 2.03723156e-14 1.11776728e-14] iteration 2 current block size: 5 eigenvalue: [4.10052890e-16 9.76744186e-01 9.76744186e-01 9.76744186e-01 1.02040816e+00 1.02040816e+00] residual norms: [5.19212918e-15 1.83085796e-13 2.13367515e-14 6.43454989e-14 3.28480178e-11 3.32759204e-13] iteration 3 current block size: 5 eigenvalue: [4.10052889e-16 9.76744186e-01 9.76744186e-01 9.76744186e-01 1.02040810e+00 1.02040816e+00] residual norms: [5.98837770e-13 2.28776891e-11 1.98930887e-14 7.69244428e-12 1.72832858e-07 2.38152764e-11] iteration 4 current block size: 5 eigenvalue: [4.10049115e-16 9.76744186e-01 9.76744186e-01 9.76744186e-01 1.01760814e+00 1.02040700e+00] residual norms: [6.20183840e-11 1.93258324e-09 3.30695986e-10 5.05859877e-10 3.21429578e-03 6.46491870e-06]

Traceback (most recent call last): File "<pyshell#25>", line 5, in eigen_solver="lobpcg").fit_predict(X) File "C:\Python27\lib\site-packages\sklearn\base.py", line 410, in fit_predict self.fit(X) File "C:\Python27\lib\site-packages\sklearn\cluster\spectral.py", line 472, in fit assign_labels=self.assign_labels) File "C:\Python27\lib\site-packages\sklearn\cluster\spectral.py", line 262, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "C:\Python27\lib\site-packages\sklearn\manifold\spectralembedding.py", line 317, in spectral_embedding largest=False, maxiter=2000) File "C:\Python27\lib\site-packages\scipy\sparse\linalg\eigen\lobpcg\lobpcg.py", line 452, in lobpcg activeBlockVectorBP, retInvR=True) File "C:\Python27\lib\site-packages\scipy\sparse\linalg\eigen\lobpcg\lobpcg.py", line 96, in _b_orthonormalize gramVBV = cholesky(gramVBV) File "C:\Python27\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 91, in cholesky check_finite=check_finite) File "C:\Python27\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 40, in _cholesky "definite" % info) LinAlgError: 5-th leading minor of the array is not positive definite [DEBUG ON]

lobpcg commented 6 years ago

https://github.com/scikit-learn/scikit-learn/pull/11968 apparently fixes the problem. Thanks!

lobpcg commented 6 years ago

I suppose that the issue can now be closed

massich commented 6 years ago

not until PR is merged. I'll get to it before next week.

lobpcg commented 5 years ago

This fix should also make AMG stable in http://scikit-learn.org/stable/auto_examples/cluster/plot_segmentation_toy.html so that the code of the example does not need to run ARPACK

dfilan commented 5 years ago

I am getting a similar problem in version 0.20.2.

This is the problematic code:


import sklearn.cluster
import scipy.sparse as sparse
import random

random.seed(1337)

rand_int = random.randrange(1000)
num_nodes = 1000
X = sparse.rand(num_nodes, num_nodes, density = 0.1, random_state = rand_int)
upper = sparse.triu(X) - sparse.diags(X.diagonal())
sym_matrix = upper + upper.T
clustering = sklearn.cluster.SpectralClustering(n_clusters = 3,
                                                eigen_solver = 'amg',
                                                random_state = rand_int,
                                                affinity = 'precomputed')
labels = clustering.fit(sym_matrix)

Both the lobpcg and amg solvers cause errors. With amg, the error is numpy.linalg.LinAlgError: 2-th leading minor of the array is not positive definite, while with lobpcg, the error is numpy.linalg.LinAlgError: the leading minor of order 9 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed. The chance of an error seems to increase with num_nodes.

System info:

System:
    python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
executable: /home/daniel/anaconda3/envs/nn_clustering/bin/python
   machine: Linux-4.15.0-45-generic-x86_64-with-debian-buster-sid

BLAS:
    macros: HAVE_CBLAS=None
  lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: openblas, openblas

Python deps:
       pip: 19.0.3
setuptools: 38.5.1
   sklearn: 0.20.2
     numpy: 1.16.1
     scipy: 1.2.1
    Cython: None
    pandas: None

Full traceback when amg is used (interestingly, it looks like the lobpcg algorithm is being called):

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 298, in spectral_embedding
    largest=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 460, in lobpcg
    aux = _b_orthonormalize(B, activeBlockVectorR)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 99, in _b_orthonormalize
    gramVBV = cholesky(gramVBV)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.LinAlgError: 2-th leading minor of the array is not positive definite

Full traceback when lobpcg is used:

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 326, in spectral_embedding
    largest=False, maxiter=2000)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 508, in lobpcg
    _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh
    " computed." % (info-b1.shape[0]))
numpy.linalg.LinAlgError: the leading minor of order 9 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.
lobpcg commented 5 years ago

@dfilan Thanks for reporting!

AMG is also LOBPCG, just with an extra option, so the name AMG is somewhat misleading, indeed...

lobpcg went through multiple fixes https://github.com/scipy/scipy/pull/9650 in the current scipy master and to appear in 1.3.0 milestone.

Could you please give it a try and report if this solves your problem?

lobpcg commented 5 years ago

May be committing https://github.com/scikit-learn/scikit-learn/pull/11968 should also independently solve this problem so the issue could be closed?

dfilan commented 5 years ago

Have downloaded current scipy master. After running the above code with 'lobpcg', I get this error:

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 326, in spectral_embedding
    largest=False, maxiter=2000)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 543, in lobpcg
    _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh
    " computed." % (info-b1.shape[0]))
numpy.linalg.LinAlgError: the leading minor of order 9 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

After running with 'amg', I get this error:

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 298, in spectral_embedding
    largest=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 463, in lobpcg
    aux = _b_orthonormalize(B, activeBlockVectorR)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 93, in _b_orthonormalize
    gramVBV = cholesky(gramVBV)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

Interestingly enough, when running scipy.test('full'), only one test fails, and it's on code that tests arpack. However, the above code succeeds when using arpack as the eigenvalue solver.

dfilan commented 5 years ago

I also downloaded the branch that #11968 would merge from and using that with normal scipy. I get similar errors. [EDIT: No I didn't, but when I wrote this comment I mistakenly thought that I did]

With lobpcg:

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 324, in spectral_embedding
    largest=False, maxiter=2000)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 508, in lobpcg
    _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh
    " computed." % (info-b1.shape[0]))
numpy.linalg.LinAlgError: the leading minor of order 9 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

With amg:

Traceback (most recent call last):
  File "clustering_bug.py", line 16, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 296, in spectral_embedding
    largest=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 460, in lobpcg
    aux = _b_orthonormalize(B, activeBlockVectorR)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 99, in _b_orthonormalize
    gramVBV = cholesky(gramVBV)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
    check_finite=check_finite)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
    "definite" % info)
numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

That being said, I'd appreciate somebody else checking - there's some chance that I messed up somewhere in the process of installing and uninstalling packages in virtual environments.

lobpcg commented 5 years ago

"leading minor of of 'b' is not positive definite" is a possible, but somewhat uncommon and typically random error in lobpcg. There may be something special with your matrix that breaks lobpcg.

Could you please force in lobpcg the value verbosityLevel = 1 (the default is 0) and post the screen output for both lobpcg and amg tests? As I have done above Sept. 1 2018?

dfilan commented 5 years ago

First: it looks like when I wrote my last comment, I hadn't actually downloaded the version of scikit-learn with the fixes in #11968, so sorry for adding confusion. I'm having trouble forcing verbosityLevel = 1: when I go into anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py and set that as default in the lobpcg function on line 124 (by making that change, saving the file, and doing nothing else), this doesn't produce any of the output like in your comment: it just basically gives me the same error message that I'm always getting. Is there some other way I should set verbosityLevel, or is this a problem between me and anaconda?

dfilan commented 5 years ago

Looks like I changed the wrong file in the above comment: have now successfully gotten verbose output to work. Linked is the 3330-line verbose output of the test with lobpcg, run with the current master branch of scipy and the usual version of scikit-learn. The output for amg is short enough to copy-paste here:

Solving standard eigenvalue problem with preconditioning

matrix size 1000
block size 4

No constraints

iteration 0
current block size: 3
eigenvalue: [-1.65773735e-17  9.98684128e-01  9.99969964e-01  1.00264676e+00]
residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01]
iteration 1
current block size: 3
eigenvalue: [-1.65773789e-17  8.71928453e-01  8.76284914e-01  8.80531337e-01]
residual norms: [2.28588577e-12 7.44726964e-02 7.14542346e-02 7.21940486e-02]
matrix gramA is not sufficiently Hermitian for a=3, b=-1:
condition: 2e-13 < 1.136868e-13
Traceback (most recent call last):
  File "clustering_bug.py", line 17, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 298, in spectral_embedding
    largest=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 543, in lobpcg
    _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh
    " computed." % (info-b1.shape[0]))
numpy.linalg.LinAlgError: the leading minor of order 5 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.
dfilan commented 5 years ago

Have run the tests with the normal version of scipy and the fixes to scikit-learn in #11968. Linked is the 3330-line output with lobpcg, and below is the output with amg:

/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Solving generalized eigenvalue problem with preconditioning

matrix size 1000
block size 4

No constraints

iteration 0
current block size: 3
eigenvalue: [-1.65773735e-17  9.98684128e-01  9.99969964e-01  1.00264676e+00]
residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01]
iteration 1
current block size: 3
eigenvalue: [-1.65773736e-17  8.71914993e-01  8.76284663e-01  8.80502757e-01]
residual norms: [1.69616187e-13 7.44772511e-02 7.14521788e-02 7.22038047e-02]
iteration 2
current block size: 3
eigenvalue: [-1.65774454e-17  8.28393569e-01  8.32759133e-01  8.35206292e-01]
residual norms: [8.35641785e-12 4.62964056e-02 4.86301534e-02 4.91496360e-02]
matrix gramA is not sufficiently Hermitian for a=3, b=-1:
condition: 1e-12 < 1.136868e-13
Traceback (most recent call last):
  File "clustering_bug.py", line 17, in <module>
    labels = clustering.fit(sym_matrix)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit
    assign_labels=self.assign_labels)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectral_embedding_.py", line 300, in spectral_embedding
    largest=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 508, in lobpcg
    _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False)
  File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh
    " computed." % (info-b1.shape[0]))
numpy.linalg.LinAlgError: the leading minor of order 6 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.
lobpcg commented 5 years ago

Thanks for your testing! Your LOBPCG trouble is evident from your output - excessive tolerance. https://github.com/scikit-learn/scikit-learn/pull/11968 fix is incomplete. As I write in https://github.com/scikit-learn/scikit-learn/pull/11968#discussion_r219569301 the tolerance 1e-15 used there is way too small for the default eigen_tol. E.g., the default eigen_tol in the https://github.com/scipy/scipy/blob/v1.1.0/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py line 307-308 is np.sqrt(1e-15) * n where n is the size of the matrix. Just force eigen_tol = 1e-5 and LOBPCG should run fine in your case, I predict. You can try even more aggressive values, checking how it affects your clustering.

The AMG trouble is unclear from the output. Could you please try the latest LOBPCG from #12319 or scipy/scipy#9650 - they are identical. The latter is added to the 1.3.0 milestone Due by May 15, 2019. I hope that your AMG call runs into a bug already fixed in LOBPCG...

dfilan commented 5 years ago

matrix size 1000 block size 4

No constraints

iteration 0 current block size: 3 eigenvalue: [-1.65773735e-17 9.98684128e-01 9.99969964e-01 1.00264676e+00] residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01] iteration 1 current block size: 3 eigenvalue: [-1.65773745e-17 8.71938222e-01 8.76296433e-01 8.80553330e-01] residual norms: [9.58530855e-13 7.44644279e-02 7.14520230e-02 7.21921377e-02] matrix gramA is not sufficiently Hermitian for a=3, b=-1: condition: 1e-12 < 1.136868e-13 iteration 2 current block size: 3 eigenvalue: [-1.66109682e-17 5.32731519e-02 1.80435953e-01 8.59707808e-01] residual norms: [1.84890229e-10 2.19266428e-01 3.42078317e-01 5.25815352e-02] matrix gramA is not sufficiently Hermitian for a=3, b=-1: condition: 1e-10 < 1.136868e-13 Traceback (most recent call last): File "clustering_bug.py", line 17, in labels = clustering.fit(sym_matrix) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit assign_labels=self.assign_labels) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectralembedding.py", line 298, in spectral_embedding largest=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 509, in lobpcg _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh " computed." % (info-b1.shape[0])) numpy.linalg.LinAlgError: the leading minor of order 6 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.

Solving generalized eigenvalue problem with preconditioning

matrix size 1000 block size 4

No constraints

iteration 0 current block size: 3 eigenvalue: [-1.65773735e-17 9.98684128e-01 9.99969964e-01 1.00264676e+00] residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01] iteration 1 current block size: 3 eigenvalue: [-1.65773781e-17 8.71919770e-01 8.76288299e-01 8.80514334e-01] residual norms: [2.11663851e-12 7.44755859e-02 7.14505277e-02 7.22013637e-02] iteration 2 current block size: 3 eigenvalue: [-1.71702097e-17 6.60138805e-01 8.28351201e-01 8.67963903e-01] residual norms: [7.80420032e-10 2.41131729e-01 5.37268611e-02 6.81385474e-02] Traceback (most recent call last): File "clustering_bug.py", line 17, in labels = clustering.fit(sym_matrix) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit assign_labels=self.assign_labels) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectralembedding.py", line 298, in spectral_embedding largest=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 461, in lobpcg aux = _b_orthonormalize(B, activeBlockVectorR) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 99, in _b_orthonormalize gramVBV = cholesky(gramVBV) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky check_finite=check_finite) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky "definite" % info) numpy.linalg.LinAlgError: 2-th leading minor of the array is not positive definite

- Using the latest LOBPCG from scipy/scipy#9650 and setting verbosityLevel = 1, but otherwise the standard scipy and scikit-learn (including standard tolerance settings, then running my code with AMG gives this output:

Solving standard eigenvalue problem with preconditioning

matrix size 1000 block size 4

No constraints

iteration 0 current block size: 3 eigenvalue: [-1.65773735e-17 9.98684128e-01 9.99969964e-01 1.00264676e+00] residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01] iteration 1 current block size: 3 eigenvalue: [-1.65773802e-17 8.71913944e-01 8.76284884e-01 8.80496749e-01] residual norms: [2.55025341e-12 7.44795540e-02 7.14518128e-02 7.22038970e-02] iteration 2 current block size: 3 eigenvalue: [-1.65905701e-17 8.24763984e-01 8.29216223e-01 8.42830283e-01] residual norms: [1.13524393e-10 5.76361485e-02 4.62052197e-02 4.37898669e-02] matrix gramA is not sufficiently Hermitian for a=3, b=-1: condition: 4e-12 < 1.136868e-13 iteration 3 current block size: 3 eigenvalue: [-2.00786875e-17 4.99950860e-04 1.47902735e-03 5.48019837e-03] residual norms: [4.25604349e-10 2.20414568e-02 3.76797281e-02 1.59178580e-02] Traceback (most recent call last): File "clustering_bug.py", line 17, in labels = clustering.fit(sym_matrix) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit assign_labels=self.assign_labels) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectralembedding.py", line 298, in spectral_embedding largest=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 475, in lobpcg aux = _b_orthonormalize(B, activeBlockVectorP, retInvR=True) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 92, in _b_orthonormalize gramVBV = cholesky(gramVBV) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky check_finite=check_finite) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky "definite" % info) numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

- Using the latest LOBPCG from scipy/scipy#9650 and setting residualTolerance = 1e-5 and verbosityLevel = 1, but otherwise the standard scipy and scikit-learn, then running my code with AMG gives different output each time I run it, but here are two samples:

Solving standard eigenvalue problem with preconditioning

matrix size 1000 block size 4

No constraints

iteration 0 current block size: 3 eigenvalue: [-1.65773735e-17 9.98684128e-01 9.99969964e-01 1.00264676e+00] residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01] iteration 1 current block size: 3 eigenvalue: [-1.65773762e-17 8.71917030e-01 8.76284997e-01 8.80506836e-01] residual norms: [1.62258985e-12 7.44769131e-02 7.14521404e-02 7.22023149e-02] matrix gramA is not sufficiently Hermitian for a=3, b=-1: condition: 3e-13 < 1.136868e-13 iteration 2 current block size: 3 eigenvalue: [-1.65828462e-17 8.26712612e-01 8.33255667e-01 8.63310975e-01] residual norms: [7.32788513e-11 5.19357834e-02 4.69287019e-02 5.55801974e-02] Traceback (most recent call last): File "clustering_bug.py", line 17, in labels = clustering.fit(sym_matrix) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit assign_labels=self.assign_labels) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectralembedding.py", line 298, in spectral_embedding largest=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 463, in lobpcg aux = _b_orthonormalize(B, activeBlockVectorR) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 92, in _b_orthonormalize gramVBV = cholesky(gramVBV) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky check_finite=check_finite) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky "definite" % info) numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

Solving standard eigenvalue problem with preconditioning

matrix size 1000 block size 4

No constraints

iteration 0 current block size: 3 eigenvalue: [-1.65773735e-17 9.98684128e-01 9.99969964e-01 1.00264676e+00] residual norms: [3.28224244e-16 1.12360881e-01 1.10785199e-01 1.12051732e-01] iteration 1 current block size: 3 eigenvalue: [-1.65773835e-17 8.71916629e-01 8.76276639e-01 8.80496140e-01] residual norms: [3.12309732e-12 7.44813239e-02 7.14519123e-02 7.22037010e-02] matrix gramA is not sufficiently Hermitian for a=3, b=-1: condition: 5e-13 < 1.136868e-13 Traceback (most recent call last): File "clustering_bug.py", line 17, in labels = clustering.fit(sym_matrix) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 493, in fit assign_labels=self.assign_labels) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/cluster/spectral.py", line 264, in spectral_clustering eigen_tol=eigen_tol, drop_first=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/sklearn/manifold/spectralembedding.py", line 298, in spectral_embedding largest=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py", line 543, in lobpcg _lambda, eigBlockVector = eigh(gramA, gramB, check_finite=False) File "/home/daniel/anaconda3/envs/nn_clustering/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 493, in eigh " computed." % (info-b1.shape[0])) numpy.linalg.LinAlgError: the leading minor of order 5 of 'b' is not positive definite. The factorization of 'b' could not be completed and no eigenvalues or eigenvectors were computed.


- I'm somewhat bothered by the non-determinism here, given that I'm setting the random seed and using my own random numbers for use in the generation of the sparse matrix and the clustering algorithm.
lobpcg commented 5 years ago

Yes, I have actually meant setting residualTolerance. Thanks for figuring this out!

Your AMG bug is unrelated to residualTolerance. Could you please try the latest LOBPCG from #12319 or scipy/scipy#9650 - they are identical. The latter is added to the 1.3.0 milestone Due by May 15, 2019. I hope that your AMG call runs into a bug already fixed in LOBPCG... If not, it would require a separate investigation.

May be your sym_matrix just cannot be directly used as affinity = 'precomputed' with AMG. It looks like you put negative diagonal values in your sym_matrix, and AMG may be unhappy about it, although it is probably undocumented if true.

import sklearn.cluster import scipy.sparse as sparse import random

random.seed(1337)

rand_int = random.randrange(1000) num_nodes = 1000 X = sparse.rand(num_nodes, num_nodes, density = 0.1, random_state = rand_int) upper = sparse.triu(X) - sparse.diags(X.diagonal()) sym_matrix = upper + upper.T clustering = sklearn.cluster.SpectralClustering(n_clusters = 3, eigen_solver = 'amg', random_state = rand_int, affinity = 'precomputed') labels = clustering.fit(sym_matrix)

dfilan commented 5 years ago

My previous comment does use the latest LOBPCG in the 6th bullet point. Regarding the sym_matrix, I had assumed that the code would ensure that its diagonal was zeroed out, which is what it looks like is happening when I manually run that segment on a 10x10 sparse matrix.

dfilan commented 5 years ago

Also, adding print(sym_matrix.diagonal()) before the call to the spectral clustering algorithm shows an array of all zeroes.

lobpcg commented 5 years ago

@dfilan My local install seems broken, so I cannot run with debugging myself and can only advise you.

1) Please make sure that you do run the latest LOBPCG from #12319 or scipy/scipy#9650 - they are identical.

2) Without AMG, lobpcg runs fine here with reasonable residualTolerance. So, this seems to be an issue with AMG preconditioning, in lines 284-303 of sklearn\manifold\spectralembedding.py - I am looking at a recent developer master sklearn version. Please check that your run goes there, e.g., by adding a print before the lobpcg call at line 297.

3) Line 293 sets the AMG preconditioning from the laplacian

    ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

Change it to

  ml = smoothed_aggregation_solver(check_array(laplacian4AMG, 'csr'))

and add a line before that, creating the new matrix laplacian4AMG from the present laplacian by adding a small, e.g., 1e-6, scalar value to every diagonal entry of laplacian. Let me know the outcome if you try this possible fix.

Let me copy @rc - may be he can help.

dfilan commented 5 years ago
lobpcg commented 5 years ago

Perfect! @dfilan you are welcome.

lobpcg commented 4 years ago

To my recollection, this is already fixed and can be closed