lil_matrix cause graph_laplacian to be too slow.

fannix commented 11 years ago

_graph_laplacian_sparse (in sklearn/utils/graph.py) use the following line to modify the sparse matrix:

lap = lap.tolil()

diagonal_holes = list(set(range(n_nodes)).difference(diag_idx))
lap[diagonal_holes, diagonal_holes] = 1

The last statement is extremely slow. I compare the lil_matrix with dok_matrix with the following script.

import scipy.sparse as sparse
import numpy as np

def test_lil(dense):
    lil = sparse.lil_matrix(dense)
    lil[lil.nonzero()] = 1

def test_dok(dense):
    dok = sparse.dok_matrix(dense)
    index0, index1 = dok.nonzero()
    for i in range(len(index0)):
        dok[index0[i], index1[i]] = 1

if __name__ == "__main__":
    dense = np.identity(1000)
    test_lil(dense)
    test_dok(dense)

And here is the result:

dok_matrix

IPython CPU timings (estimated): User : 0.08 s. System : 0.01 s. Wall time: 0.10 s.

lil_matrix IPython CPU timings (estimated): User : 16.42 s. System : 0.09 s. Wall time: 16.73 s.

larsmans commented 11 years ago

Unsurprising -- LIL is a really, really stupid format with quadratic time construction. It's so slow that I don't even use it for prototyping anymore.

The funny thing is that the tests fail when I just plug in todok -- dok_matrix.tocoo shuffles the data and col members.

fannix commented 11 years ago

Thanks!

On Wed, Feb 27, 2013 at 12:25 AM, Lars Buitinck notifications@github.comwrote:

Thishttps://github.com/scipy/scipy/commit/94d9d4926733106364222d4e76ff05f8d2329623seems related.

— Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/issues/1698#issuecomment-14123773 .

Best Wishes

Meng Xinfan（蒙新泛） Institute of Computational Linguistics Department of Computer Science & Technology School of Electronic Engineering & Computer Science Peking University Beijing, 100871 China

larsmans commented 11 years ago

@fannix I imported a new version from Scipy, please try current master.

larsmans commented 11 years ago

Hmm, I hadn't expected Jake's picture to appear here -- I did a git commit --author with his name since he did most of the updates :)

jakevdp commented 11 years ago

Thanks Lars!

larsmans commented 11 years ago

Since LIL matrices are no longer used, I'm presuming this issue is fixed. Feel free to reopen if the problem persists.

scikit-learn / scikit-learn

lil_matrix cause graph_laplacian to be too slow. #1698

Best Wishes