Error -9999 with `dnaupd` in ARPACK 3.6 for a matrix that worked with ARPACK 3.5

ntamas commented 6 years ago

The easiest way to test this issue is with GNU Octave 4.4.0 linked to ARPACK 3.6, but the same problem can be reproduced from C as well. The matrix is the PageRank matrix of an undirected star graph with 11 vertices and damping = 0.85:

octave:1> m = ones(11, 11) * 0.15 / 11;
octave:2> m(1, 2:11) += 0.85;
octave:3> m(2:11, 1) = (1 - m(1, 1)) / 10;

The matrix has three eigenvalues: 1, -0.85 and 0 (multiplicity = 9). Asking Octave to obtain the dominant eigenvalue with eigs (which in turn uses dnaupd) yields an error:

octave:4> [v, lambda] = eigs(m, 1)
error: __eigs__: eigs: error -9999 in dnaupd
error: called from
    eigs at line 285 column 18

I know that the multiplicity of the zero eigenvalue could be a problem here; however, note that this matrix worked just fine with ARPACK 3.5 and a simple power iteration also converges to the correct dominant eigenvector. I managed to narrow the source down to this single line of code, but if my understanding is correct, then this line is actually necessary for the referenced patch to work.

Is there anything I am missing here?

(For what it's worth, I have also managed to reproduce the problem by modifying the matVec function in TESTS/bug_1315_double.c to perform the multiplication with the matrix described above. Note that the test case currently does not check the value of info after running dnaupd so the error goes unnoticed by the test case, but I have verified that info is -9999 in the test case. Let me know if you need my modified plain C code to test this).

sylvestre commented 6 years ago

@caliarim rings a bell ? thanks

caliarim commented 6 years ago

I think I know what is happening. Can you modify your plain C file (not octave) and solve a generalized problem with the silly matrix M = identity? I think it will fail, both with 3.5.0 and 3.6.0.

ntamas commented 6 years ago

Yes, you are correct; modifying the C source to solve a generalized problem with M = identity makes the computation fail both in 3.5.0 and in 3.6.0.

Attached is my C code (which is a crude modification of bug_1315_double.c so please bear with the formatting and everything).

bug_142_double.c.gz

caliarim commented 6 years ago

I hoped I was wrong. It means there is a long standing bug in Arpack with rank deficient matrices. It cannot compute a Krylov basis of length ncv, since with my patch (or in the generalized case, even before my patch) the initial vector is in the range of the operator (of dimension < ncv). Arpack tries to change the initial vector some times, but it is always in the range of the operator. After some attempts, it failes with info = -9999. I see two possibilities: 1) take the initial vector in the range of the operator only the very first time: any further attempt should not do it or 2) if after some attempts it is not possible to build a basis of length ncv, guess that the matrix is rank deficient and stop to compute the basis, instead of failing. I will need some time to reproduce the problem in fortran77 and try these (or other) possible solutions. Unfortunately, I do not see a workaround in the meantime.

ntamas commented 6 years ago

Thanks for the quick response! I am not familiar with the internals of the algorithm implemented in ARPACK, but based on my limited understanding, implementing option 1 would take us back to where we was with ARPACK 3.5 if the matrix is rank-deficient, while keeping the benefits of the proposed change in all other cases. This would suit me but I was wondering whether this would affect the correctness of the result; if the result is likely to be incorrect with this workaround, I'd rather be on the safe side even if it means that I need to wait more.

caliarim commented 6 years ago

Can you try if the following change

 --- dgetv0_3.6.f   2018-07-27 17:04:22.170831356 +0200
 +++ dgetv0_142.f   2018-07-27 16:59:47.690826426 +0200
 @@ -242,12 +242,14 @@
 c        %----------------------------------------------------------%
 c
          call arscnd (t2)
-         nopx = nopx + 1
-         ipntr(1) = 1
-         ipntr(2) = n + 1
-         call dcopy (n, resid, 1, workd, 1)
-         ido = -1
+         if (itry .eq. 1) then
+            nopx = nopx + 1
+            ipntr(1) = 1
+            ipntr(2) = n + 1
+            call dcopy (n, resid, 1, workd, 1)
+            ido = -1
          go to 9000
+         end if
       end if
 c 
 c     %-----------------------------------------%
@@ -274,7 +276,7 @@
 c
       call arscnd (t2)
       first = .TRUE.
-      call dcopy (n, workd(n+1), 1, resid, 1)
+      if (itry .eq. 1) call dcopy (n, workd(n+1), 1, resid, 1)
       if (bmat .eq. 'G') then
          nbx = nbx + 1
          ipntr(1) = n + 1

1) fixes the problem 2) passes the tests

? Thanks

ntamas commented 6 years ago

I have tested the patch above - it fixes the problem and the failing test cases in igraph. Thanks a lot!

caliarim commented 6 years ago

All the previous tests in ARPACK pass, too. ntamas, I see in your last post here https://github.com/igraph/igraph/issues/1107 that there is still one failing test. Is it related to my previous commit, too?

ntamas commented 6 years ago

Nope, I need to fix that; that particular test case is probably unrelated. ARPACK returns eigenvectors that are -1 times the "old" eigenvectors that previous versions returned, and the test case in igraph is too strict and does not ignore the sign. I only need to update the test case to make everything pass.

caliarim commented 6 years ago

My previous patch has a problem in the generalized case. It should be

--- dgetv0_360.f    2018-07-30 13:23:25.218601879 +0200
+++ dgetv0.f    2018-07-30 13:21:34.558605942 +0200
@@ -242,12 +242,16 @@
 c        %----------------------------------------------------------%
 c
          call arscnd (t2)
-         nopx = nopx + 1
-         ipntr(1) = 1
-         ipntr(2) = n + 1
-         call dcopy (n, resid, 1, workd, 1)
-         ido = -1
-         go to 9000
+         if (itry .eq. 1) then
+            nopx = nopx + 1
+            ipntr(1) = 1
+            ipntr(2) = n + 1
+            call dcopy (n, resid, 1, workd, 1)
+            ido = -1
+            go to 9000
+         else if (itry .gt. 1 .and. bmat .eq. 'G') then
+            call dcopy (n, resid, 1, workd(n + 1), 1)
+         end if
       end if
 c 
 c     %-----------------------------------------%
@@ -274,7 +278,7 @@
 c
       call arscnd (t2)
       first = .TRUE.
-      call dcopy (n, workd(n+1), 1, resid, 1)
+      if (itry .eq. 1) call dcopy (n, workd(n+1), 1, resid, 1)
       if (bmat .eq. 'G') then
          nbx = nbx + 1
          ipntr(1) = n + 1

ntamas commented 6 years ago

Re-tested the failing test cases in igraph with the new version and it works equally well, FYI.

ntamas commented 6 years ago

Okay, I spoke too soon; I was checking that one single remaining test case failure in igraph, and it turns out that it also fails for ARPACK 3.6 but not for ARPACK 3.5, and the cause of the failure can also be narrowed down to the same commit that caused the original issue reported here.

The test case in igraph attempts to find the three eigenvectors with the largest imaginary part in the following 10x10 matrix (generated randomly with a pre-determined seed so it's the same with every test run):

octave:1> a = [-6 0 10 3 8 1 -4 10 -8 0
 -6 1 0 8 -4 4 -7 1 1 6
 7 -7 8 6 -4 -8 -1 -7 -3 -7
 6 8 -4 -1 10 3 7 7 -3 -8
 1 -7 -4 9 0 5 5 6 -8 10
 -9 10 -5 -9 5 3 -5 7 -7 10
 -3 0 8 -6 -2 -7 1 -3 -8 1
 2 0 9 -3 0 -9 -4 0 10 0
 -9 1 -6 -1 7 10 9 9 8 -2
 -7 1 9 -7 10 -1 -2 -5 7 6]
a =

   -6    0   10    3    8    1   -4   10   -8    0
   -6    1    0    8   -4    4   -7    1    1    6
    7   -7    8    6   -4   -8   -1   -7   -3   -7
    6    8   -4   -1   10    3    7    7   -3   -8
    1   -7   -4    9    0    5    5    6   -8   10
   -9   10   -5   -9    5    3   -5    7   -7   10
   -3    0    8   -6   -2   -7    1   -3   -8    1
    2    0    9   -3    0   -9   -4    0   10    0
   -9    1   -6   -1    7   10    9    9    8   -2
   -7    1    9   -7   10   -1   -2   -5    7    6

Using eig() in Octave (which uses LAPACK), I can verify that the three eigenvalues with the largest imaginary parts are: -3.00735 + 19.2957i, -3.00735 - 19.2957i and 12.1099 + 6.27293i (of course the complex conjugate pair of the latter is also in the set of eigenvalues). ARPACK correctly reports these in the igraph test case. It also correctly reports the eigenvectors for the first two eigenvalues. However, for the third eigenvalue, the eigenvector reported by ARPACK mixes up the real and the imaginary parts and then negates the real part.

The eigenvector reported by LAPACK in Octave is as follows:

octave:2> [vec, val] = eig(a);
octave:3> val = diag(val);
octave:4> val(7)
ans =  12.1099 +  6.2729i
octave:5> vec(:, 7)
ans =

   0.10197 + 0.24665i
   0.29260 - 0.22392i
  -0.24166 + 0.02660i
   0.20076 + 0.08521i
   0.07248 + 0.37699i
   0.24232 - 0.13719i
  -0.48469 + 0.00000i
  -0.04705 + 0.32278i
   0.02209 + 0.19020i
  -0.06435 + 0.28162i

ARPACK reports the following values in the 3rd and 4th columns of the eigenvector memory area (first and second columns are for the correct eigenpair, 3rd and 4th belong to the next eigenpair):

0.21912 0.152389
0.125122 -0.346547
-0.188931 0.153007
0.21496 -0.0368355
0.264759 0.277981
0.129704 -0.246402
-0.40777 0.262001
0.134903 0.296988
0.121394 0.148076
0.0980886 0.271714

This is parallel to the eigenvector reported by LAPACK in Octave only if you swap the two columns and then negate the first column (or, alternatively, if you take the complex conjugate and then swap the real part with the imaginary one).

The same test case in igraph also tests dnaupd() in modes LM and SR, and those are okay.

caliarim commented 6 years ago

no, the two eigenvectors are parallel, they differ by a complex multiplicative constant.

ntamas commented 6 years ago

Indeed; sorry for the noise!

caliarim commented 6 years ago

@ntamas: do you confirm that all your tests now pass? I'm going to make a pull request.

ntamas commented 6 years ago

Yes, all test cases pass now, thanks for your efforts!

caliarim commented 6 years ago

Am I write that this change, as others, is not ported to pARPACK?

sylvestre commented 6 years ago

It seems, would be great to do it too!

S

juanjosegarciaripoll commented 1 year ago

I am afraid to reopen this issue, but arpack fails once more with identity matrices. One just needs to change TESTS/icb_arpac_c.c to return the same vector and the test fails to run.

void dMatVec(const double* x, double* y) {
  int i;
  for (i = 0; i < 1000; ++i) y[i] = x[i]; /* ((double)(i + 1)) * x[i];*/
};

opencollab / arpack-ng

Error -9999 with `dnaupd` in ARPACK 3.6 for a matrix that worked with ARPACK 3.5 #142