snap-stanford / graphwave

170 stars 53 forks source link

when there are millions of nodes,compute the full eigendecomposition of a large matrix will get "MemoryError" #3

Open chenwgen opened 6 years ago

chenwgen commented 6 years ago

when there are millions of nodes,compute the full eigendecomposition of a large matrix will get "MemoryError" , so this method can't be used to big community.

bkj commented 6 years ago

You can use the Chebyshev option instead of the full eigendecomposition, but it's still not going to be great. I've done a fairly major refactor of this code that scales better, but there are still memory and runtime limitations:

https://github.com/bkj/graphwave

The paper does say that the complexity of this algorithm scales linearly with the number of edges in the graph, but I'm not sure that's actually right.

~ Ben

donnate commented 6 years ago

Hi,

Thank you for your interest in GraphWave! Indeed, the code has to be adapted to handle very large graphs (in particular, using adapted libraries such as snap.py instead of networkx). In particular, the Chebychev approximation will give you an approximation of a function of the eigendecomposition, and is just a polynomial in the adjacency and thus, linear in the number of edges. As such, it is suitable to the handling of sparse, large graphs. However, I did not write the code with gigantic graphs in mind, so there might be other bottlenecks elsewhere (in terms of the library used and code itself). I plan on trying to work on this in the next few months, and will keep you updated on the progress! Best regards,

Claire

Anon13122021 commented 2 years ago

I think line 83 in 'graphwave/graphwave/graphwave.py' causes the sparse matrix heat[i] to be densified (specifically the 'heat[i].A'):

temp = thres(heat[i].A) # cleans up the small coefficients

It may be worth finding a way to avoid densifying the matrix to reduce the complexity.