weyns49570 / efficient-java-matrix-library

Automatically exported from code.google.com/p/efficient-java-matrix-library
0 stars 0 forks source link

SingularOps.descendingOrder attempts to set a value out of bounds #5

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. The attached serialized double[] is taken from DenseMatrix64F.getData(). The 
matrix is 832 x 10000. 
2. Load the matrix and then run the following code.

DenseMatrix64F m = ...;
SvdImplicitQrDecompose svd = new SvdImplicitQrDecompose(true);
svd.decompose(m);

final DenseMatrix64F u = svd.getU();
final DenseMatrix64F s = svd.getW(null);
final DenseMatrix64F v = svd.getV();

// This fails:  
SingularOps.descendingOrder(u, s, v);

3. SingularOps#descendingOrder will throw an exception:

java.lang.IllegalArgumentException: Specified element is out of bounds
    at org.ejml.data.DenseMatrix64F.set(DenseMatrix64F.java:224)
    at org.ejml.ops.SingularOps.descendingOrder(SingularOps.java:76)
    ...

What version of the product are you using? On what operating system?

Using release 0.12 from google code.

Other notes:

Great library. The only one I've used that can do an SVD of this size in < 4GB 
memory.

Original issue reported on code.google.com by xcolw...@gmail.com on 25 Jun 2010 at 5:25

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thanks for reporting the bug.  It was an easy to fix bug.  Check out the latest 
SVN code to get the fix.  You also might want to consider using 
DecompositionFactory.svd(true,true,true) instead of directly creating an 
instance of class SvdImplicitQrDecompose.

Original comment by peter.ab...@gmail.com on 26 Jun 2010 at 12:58

GoogleCodeExporter commented 9 years ago
Thanks for the quick reply! I will verify the fix on my data set tomorrow and 
post back. 

On the topic of DecompositionFactory, my application does not need V to be 
computed, yet it looks like the "needsV" flag of DecompositionFactory#svd has 
no impact on the selected algorithm. More specifically my application is a 
simple PCA, where all I need is the U vector to project the data points (PCA = 
transpose(m) * u). Since not computing V would halve the memory requirement, 
this is very ideal ... do you have any plans to implement the "needsV" flag in 
the future?

Original comment by xcolw...@gmail.com on 27 Jun 2010 at 6:04

GoogleCodeExporter commented 9 years ago
You're welcome.  I am planning on adding that functionality.  Not exactly sure 
when, but since there is a need for it I'll see if I can do it this week.  Let 
me know how the run through your data goes.

Original comment by peter.ab...@gmail.com on 28 Jun 2010 at 9:31

GoogleCodeExporter commented 9 years ago
Hi Peter, I updated the code today and caught your fixes -- including new 
"needU"/"needV" support! Tried it out, looks good. Actually, surprisingly, 
EJML's SVD is running faster than GSL's SVD (which uses CBLAS). I haven't done 
a thorough comparison yet, but the results from EJML look as expected. Thanks.

Original comment by xcolw...@gmail.com on 30 Jun 2010 at 4:06

GoogleCodeExporter commented 9 years ago
Regarding GSL, in my application, EJML singular values (from the QR implicit 
solver) differ from GSL's from the modified Golub-Reinsch (described at 
http://www.gnu.org/software/gsl/manual/html_node/Singular-Value-Decomposition.ht
ml) by at most +-0.25 . 

Original comment by xcolw...@gmail.com on 30 Jun 2010 at 4:26

GoogleCodeExporter commented 9 years ago
Yep I ended up messing with this a bit longer last night than I planned.  The 
SVD algorithm is about 50% faster since I realized that something it was doing 
was redundant.  Then there is another significant speed up by not computing V.

Thanks for pointing out GSL.  I was not aware of that package.

For the difference between GSL and EJML.  You can verify the correctness of 
both SVD algorithms by; 1) computing the SVD of A, 2) recomputing A'=U S V^T, 
and 3) computing the norm2 of the difference |A-A'| which will hopefully be a 
small number, around 1e-15.

I did run EJML's SVD through a battery of tests to check its correctness, but 
there are some special cases not tested yet.  I suspect this might have 
something to do with nearly identical singular values.  Those are common in 
larger matrices which I haven't tested extensively yet.

If it looks like a problem in EJML then could you create a new issue and attach 
an example matrix?

Original comment by peter.ab...@gmail.com on 30 Jun 2010 at 4:48

GoogleCodeExporter commented 9 years ago
Actually, some bad news, I hit the problem again, with a new matrix. The 
serialized double array (matrix.getData()) is attached.

Exception in thread "main" java.lang.IllegalArgumentException: Specified 
element is out of bounds
    at org.ejml.data.DenseMatrix64F.set(DenseMatrix64F.java:227)
    at org.ejml.ops.SingularOps.descendingOrder(SingularOps.java:76)
...

Original comment by xcolw...@gmail.com on 1 Jul 2010 at 2:53

Attachments:

GoogleCodeExporter commented 9 years ago
Btw, please let me know if there's anything I can contribute. I'm not a expert 
in linear algebra by any means, though. 

Original comment by xcolw...@gmail.com on 1 Jul 2010 at 2:54

GoogleCodeExporter commented 9 years ago
Not sure if I did it right, but when I deserialized that file all I got was a 
very large double[].  Since I don't know the number of rows or columns it 
couldn't use it.  But I think I know what happened anyways.  

I'm guessing that you passed true to getU() or getV().  DescendingOrder() used 
to assume that both matrices are not transposed.  If you update to the latest 
SVN that problem should be fixed.

-----

Thanks for the helping out offer.  There is plenty to do.  Finding bugs is 
always useful.  Feedback on the documentation and the API.  Is the 
documentation useful?  Is the API easy to use and reasonably intuitive?

Original comment by peter.ab...@gmail.com on 1 Jul 2010 at 10:50

GoogleCodeExporter commented 9 years ago
The matrix is 9025 x 2336 .

Here's the code I use to do a PCA:

SingularValueDecomposition svd = DecompositionFactory.svd(false, true, true);
svd.decompose(mt);

final DenseMatrix64F s = svd.getW(null);
final DenseMatrix64F v = svd.getV(false);

// Problem:
SingularOps.descendingOrder(null, s, v);

// Resize v 
// ...
// CommonOps.mult(mt, vsub, pca);

Original comment by xcolw...@gmail.com on 1 Jul 2010 at 3:32

GoogleCodeExporter commented 9 years ago
Hi Peter, just a quick note, I updated to the latest code, #checkSvdMatrixSize 
may not account for a null U/V matrix. 

Original comment by xcolw...@gmail.com on 2 Jul 2010 at 4:30

GoogleCodeExporter commented 9 years ago
The problem is actually inside of the SVD algorithm. If it gets two numbers 
that are very small in a particular part they become zero when multiplied 
together.  This eventually causes a divided by zero error and NaN for singular 
values.  That intern hosed descendingOrder().

There is also a bug in descendingOrder() that's in SVN right now where it 
doesn't check to see of a matrix is null before using it.

I almost got a fix for all of these issues ready to go, but I just found a 
matrix that causes some problems.

Original comment by peter.ab...@gmail.com on 2 Jul 2010 at 4:46

GoogleCodeExporter commented 9 years ago
Just committed a candidate fix to all of these problems.  Also created a new 
issue for the SVD divided by zero problem.  The current code can process the 
matrix you provided.

Can you run this through your data and tell me if you encounter any problems?

Original comment by peter.ab...@gmail.com on 2 Jul 2010 at 5:10

GoogleCodeExporter commented 9 years ago
Just an update, I've been running with the candidate code for the past week. 
Everything checks out so far -- no problems. 

Original comment by xcolw...@gmail.com on 12 Jul 2010 at 4:26

GoogleCodeExporter commented 9 years ago
Thanks for the update.  The latest SVN should be a little bit more accurate 
than what you are running and descendingOrder() can handle inputs with NaN in 
it.  If everything still seems to be going well i'll close the other issue.

http://code.google.com/p/efficient-java-matrix-library/issues/detail?id=6

Original comment by peter.ab...@gmail.com on 12 Jul 2010 at 9:53