lisitsyn / tapkee

A flexible and efficient С++ template library for dimension reduction
http://tapkee.lisitsyn.me
BSD 3-Clause "New" or "Revised" License
232 stars 57 forks source link

arpack eigendecomposition and dense eigendecomposition gave different eigenvalues #32

Closed coderlevi closed 8 years ago

coderlevi commented 8 years ago

I am using the isomap.

I tried arpack eigendecomposition and it gave the two largest eigenvlues of a shortest distance matrix as -7.12559e+06 4.72011e+06

But when I tried the dense eigendecomposition, it gave the two largest eigenvlues of the same shortest distance matrix as 3.62429e+06 4.72011e+06

By the way, I have symmetrized the neighbors (see issue https://github.com/lisitsyn/tapkee/issues/30) and have checked the input matrix of eigendecomposition is symmetric.

arpack eigendecomposition and dense eigendecomposition agree on the second eigenvalue but not on the first one in this case.

In the embedIsomap() function, there is such code block

for (IndexType i=0; i < static_cast(p_target_dimension); i++) embedding.first.col(i).array() *= sqrt(embedding.second(i));

arpack eigendecomposition gave a negative eigenvalue and it did not work to call sqrt() here in this case

I tried to increase the ncv from current 20 to 100 in arpack_wrapper.hpp but it did not work and still gave the two eigenvalues -7.12559e+06 4.72011e+06

By the way, the input symmetric matrix is 5000 by 5000

Appreciate very much for any idea on the arpack eigendecomposition issue in my this case.

lisitsyn commented 8 years ago

@coderlevi thanks for this investigation :)

This big negative eigenvalue sounds weird and must be due to some issue with arpack or tapkee's interface to arpack. Could you please output the matrix somewhere and then check (with some other code, python, matlab etc) what are the eigenvalues?

coderlevi commented 8 years ago

There is no issue with the tapkee's interface to arpack. I output the matrix of 5000 by 5000 and use octave eigs() to solve the matrix. I got the same eigenvalues as using tapkee arpack eigendecomposition. octave got the two largest magnitude eigenvalues as -7.12559e+06 4.72011e+06 as well. The default option for eigs() is "LM" which means to find the largest magnitude eigenvalues.

This is good. Our tapkee interface to arpack is working pretty well.

The issue is that we may not use arpack eigendecomposition correctly in isomap.

Currently tapkee is setting ARPACK_CODE = "LM" when using arpack eigendecomposition. I think we should set ARPACK_CODE = "LA"

dense eigendecomposition will sort the eigenvalues in increasing order and when using dense eigendecomposition isomap just uses the largest two eigenvalues (which are largest algebraic not largest magnitude).

Also I found this paper on isomap : http://wearables.cc.gatech.edu/paper_of_week/isomap.pdf In the Table 1 of this paper, the step 3 says : Let \lambda_p be the p-th eigenvalue (in decreasing order) and v_p be the p-the eigenvector. Then set the p-th vector y equal to \squar_root(\lambda_p)v_p

From this we can see that isomap is not using the largest magnitude eigenvalues but using the largest algebraic eigenvalues.

So in the tapkee we should set ARPACK_CODE = "LA" instead of "LM"

When I set ARPACK_CODE = "LA", I got the same answer using arpack eigendecomposition as using dense eigendecomposition in my this case, which is awesome!

lisitsyn commented 8 years ago

@coderlevi awesome research, thanks for that! I'd need to check what implications it would have to set it to LA. IIRC it affects only Isomap and MDS.

lisitsyn commented 8 years ago

Closing as we now use @coderlevi solution to employ 'LA' option.