Open moustaki opened 11 years ago
I can reproduce this in 0.1.9, thanks for the report y.
@alepulver this error doesn't happen in 0.1.8 -- any idea where the regression's coming from?
This seems to happen with any matrix equivalent to numpy.diag(a,b,b,a)
when a and b are lower than 20.
I believe svdLAS2SA
could be returning something different in these cases. What did it return in 0.1.8 where it worked?
Edit: found it. In these cases, srec.d
is 0 (the length of "s". as returned by SVDLIBC). What should we do with the result?
0.1.8 returns
(array([[ 0.32372233, 0.4780341 , 0.79283656, 0.19518564],
[-0.19518564, 0.79283656, -0.4780341 , 0.32372233]]),
array([ 1.30002747, 1.30002747]),
array([[ 0.36771077, 0.60121131, 0.36249528, 0.60986139],
[ 0.60986139, -0.36249528, 0.60121131, -0.36771077]]))
(same machine)
Could you please confirm that svdResult->d
is 2, as it should be?
Edit: don't worry, I don't think that's the case. I'm getting d=0 and S=[0,0] from SVDLIBC. Either I'm calling it incorrectly, or something very wrong is going on.
Were there any special compilation flags I removed when switching to Cython?
When enabling SVDVerbose in svdlib.c, I get the following output when it works:
In [2]: sparsesvd.sparsesvd(scipy.sparse.csc_matrix(np.diag([1,15,15.1,1])), 2)
SOLVING THE [A^TA] EIGENPROBLEM
NO. OF ROWS = 4
NO. OF COLUMNS = 4
NO. OF NON-ZERO VALUES = 4
MATRIX DENSITY = 25.00%
MAX. NO. OF LANCZOS STEPS = 4
MAX. NO. OF EIGENPAIRS = 2
LEFT END OF THE INTERVAL = -1.00E-30
RIGHT END OF THE INTERVAL = 1.00E-30
KAPPA = 1.00E-06
NUMBER OF LANCZOS STEPS = 4
RITZ VALUES STABILIZED = 4
SINGULAR VALUES FOUND = 2
And this when it doesn't.
In [3]: sparsesvd.sparsesvd(scipy.sparse.csc_matrix(np.diag([1,15,15,1])), 2)
SOLVING THE [A^TA] EIGENPROBLEM
NO. OF ROWS = 4
NO. OF COLUMNS = 4
NO. OF NON-ZERO VALUES = 4
MATRIX DENSITY = 25.00%
MAX. NO. OF LANCZOS STEPS = 4
MAX. NO. OF EIGENPAIRS = 2
LEFT END OF THE INTERVAL = -1.00E-30
RIGHT END OF THE INTERVAL = 1.00E-30
KAPPA = 1.00E-06
NUMBER OF LANCZOS STEPS = 4
RITZ VALUES STABILIZED = 0
SINGULAR VALUES FOUND = 0
Do you have any ideas, @piskvorky ?
Heh, might well be connected to the "fewer factors than requested" discussed elsewhere...
@alepulver Is the output deterministic for you (0.1.8 works, 0.1.9 doesn't)? Or does it fail intermittently even on the same release?
Sorry for the delay. I don't know if it's related, but at least 0.1.8 behaves the same in my machine. The only difference is that it doesn't throw an exception, but returns an array with invalid shape (so it's SVDLIBC returning no singular values).
This seems a SVDLIBC problem.
In [1]: import numpy as np, sparsesvd, scipy.sparse
In [2]: sparsesvd.sparsesvd(scipy.sparse.csc_matrix(np.diag([1,2,2,1])), 2)
Out[2]:
(array([], shape=(0, 4), dtype=float64),
array([], dtype=float64),
array([], shape=(0, 4), dtype=float64))
In [3]: sparsesvd.sparsesvd(scipy.sparse.csc_matrix(np.diag([1,2,2.1,1])), 2)
Out[3]:
(array([[ -2.58143374e-20, 8.98751972e-16, -1.00000000e+00,
-1.32169408e-17],
[ -6.77626358e-21, -1.00000000e+00, -8.16013923e-16,
-2.77555756e-17]]),
array([ 2.1, 2. ]),
array([[ -5.42101086e-20, 9.43689571e-16, -1.00000000e+00,
-2.77555756e-17],
[ -1.35525272e-20, -1.00000000e+00, -7.77156117e-16,
-5.55111512e-17]]))
Aha, ok, thanks. I'll have to investigate more thoroughly then :)
I'll check to see if vanilla SVDLIBC behaves the same way too (no Python), and then it's debugging time...
I debugged, and the problem seems to be somewhere deep inside LAS2. The error bounds (bnd
array coming out of lanso
) are rubbish in face of algebraic multiplicity, although the ritz
values are fine. The large bounds lead to ritz values being classified as "unstabilized" => errors later on.
This appears to be the root of all the other reported issues with sparsesvd
as well: #1, #3 .
I tried comparing SVDLIBC (=Doug Rohde's version) against SVDPACKC (the original Michael Berry's version), but didn't notice any relevant difference in the code. So this possibly happens with SVDPACKC as well, which is really strange.
I sent an email to Doug Rohde, hopefully he'll shed some light.
Doug's reply:
"I haven't worked on SVDLIBC in years. And it was a port of someone else's code, so I never fully understood the majority of it. Mostly I was fixing bugs and improving memory management. So I probably can't offer that much about the algorithm, other than trying to read and understand the code."
Thank you for spending so much time on this issue. I'm not really into numerical methods, so I may not be able to help. But maybe there is a paper about the algorithm, or newer ones. As for related Python modules I've found:
Divisi2 (which doesn't support Python 3.x) http://csc.media.mit.edu/docs/divisi2/
sklearn (which also includes recent references to papers) http://scikit-learn.org/stable/modules/decomposition.html
mlpy (it's not clear if it supports sparse matrices efficiently) http://mlpy.sourceforge.net/docs/3.5/dim_red.html#principal-component-analysis-pca
They mention PCA, but AFAIK it's related to the SVD of the normalized and rescaled matrix (or the covariance matrix), so I thought it might be useful.
Divisi2 uses SVDLIBC too, and therefore suffers from the same problems.
I already created and maintain a package for large-scale sparse SVD (gensim
), so this is not a big issue for me. But to tell the truth, I am not thrilled about maintaining sparsesvd
if it contains bugs that nobody understands and nobody can fix...
Would you like to become the principal maintainer of sparsesvd
, @alepulver ?
Or let's put it differently -- the person who can successfully resolve this issue will be crowned the king of sparsesvd
:)
I don't think I can debug it. Maybe an exception could be raised in those cases, or adding a tiny random noise to make it work (at least close to the real result). What do you think of this "hacky" solution?
I don't think that's a good idea @alepulver . Partly because we don't even know what "those cases" are.
I'll try to delve into the details and fix this, but I can't promise a date. For now (and the past 10 years?), it looks like the library will remain broken.
Is this behaviour related ? I'm getting the very same (weird) result (no eigenvalues, empty U) with vanilla svdlibc (now I'm looking around for patches):
import numpy, scipy.sparse from sparsesvd import sparsesvd smat = scipy.sparse.csc_matrix((2, 2)) smat[1,1]=1.0 u, s, vt = sparsesvd(smat, 2) Traceback (most recent call last): File "
", line 1, in File "sparsesvd.pyx", line 39, in sparsesvd.sparsesvd (sparsesvd.c:2239) File "stringsource", line 247, in View.MemoryView.array_cwrapper (sparsesvd.c:5951) File "stringsource", line 147, in View.MemoryView.array.cinit (sparsesvd.c:4887) ValueError: Invalid shape in axis 0: 0. u, s, vt = sparsesvd(smat, 1) Traceback (most recent call last): File " ", line 1, in File "sparsesvd.pyx", line 39, in sparsesvd.sparsesvd (sparsesvd.c:2239) File "stringsource", line 247, in View.MemoryView.array_cwrapper (sparsesvd.c:5951) File "stringsource", line 147, in View.MemoryView.array.cinit (sparsesvd.c:4887) ValueError: Invalid shape in axis 0: 0. u, s, vt = sparsesvd(smat, 0) Traceback (most recent call last): File " ", line 1, in File "sparsesvd.pyx", line 39, in sparsesvd.sparsesvd (sparsesvd.c:2239) File "stringsource", line 247, in View.MemoryView.array_cwrapper (sparsesvd.c:5951) File "stringsource", line 147, in View.MemoryView.array.cinit (sparsesvd.c:4887) ValueError: Invalid shape in axis 0: 0. smat[0,0]=1.0 u, s, vt = sparsesvd(smat, 2) u array([[ 3.12553483e-04, 9.99999951e-01]])
@pminervini almost certainly; let us know if you find something 8)
@piskvorky Maybe you can be intersted in GraphLab (for distributed environments) and GraphChi (for multicore machines): http://graphlab.org/graphchi/ they both include an highly efficient implementation of SVD (and other algorithms); AFAIK, GraphChi allows you to do the same things as SVDLIBC (possibly more efficiently), without the need to dive deep into ancient C code
@pminervini GraphiChi sounds great, thanks for the link! Also check out gensim for fast SVD (gensim targets topic modelling though, not collab filtering).
I had the "ValueError: Invalid shape in axis 0: 0." . It happens when I blend two matrixes and the one has vector weights as 0. I added a slight offset 0.0000001 and it works now.
Hello!
I was wondering about the following behaviour, which made one of my tests break in a recent sparsesvd update.
Not sure what is going on here - changing any value in this matrix seems to make it work.
Best, y