Support for huge matricies

GoogleCodeExporter commented 9 years ago

I would be interested in adding support for huge matrices backed by memory 
mapped files.  This could support large, persisted matrices with minimal heap 
foot print.  On my blog I have an example of a "sparse" 1,000,000 x 1,000,000 
matrix which is rather slow but a matrix of 100,000 x 100,000 might be 
practical.

Original issue reported on code.google.com by peter.la...@gmail.com on 20 Jan 2012 at 9:20

GoogleCodeExporter commented 9 years ago

This sounds interesting.  Adding support for memory mapped files is not 
something that can be simply plugged in. EJML gets much of its performance by 
working with raw arrays and has little abstraction inside of its low level 
functions.  It shouldn't be too difficult to copy and rewrite some of these 
algorithms to use accessors instead.  

The hard part is figuring out how to make it fast.  Processing these matrices 
by blocks I suspect would be much faster than as row/column major.  Not all 
operators support block matrices yet.

Before I say too much more, by "interesting in adding support for" do you mean 
you personally would like to write the code or it would be great if I wrote the 
code?

Original comment by peter.ab...@gmail.com on 20 Jan 2012 at 4:45

GoogleCodeExporter commented 9 years ago

I mean I would be interested in writing the code.  Using accessor methods don't 
have impact performance provided they are simple enough to be inlined.

Using the Unsafe can bypass some of the bounds checks and give better 
performance than using ByteBuffer, but can limit portability. (another reason 
to abstract the use of this class)  It also has the advantage that it can 
allocate memory of any size (not limited by Integer.MAX_VALUE)

Original comment by peter.la...@gmail.com on 20 Jan 2012 at 4:59

GoogleCodeExporter commented 9 years ago

You referring to sun.misc.Unsafe?  Just learned about that undocumented 
functionality right now.  I have also read that OpenJDK has support for unsafe 
array access, but that other JDK's do not support that.

Probably a good feasibility experiment would be to implement matrix 
multiplication for row major and block matrices.  See which one is faster and 
how long it would take to multiply matrices.  How familiar are you with the 
computational side of linear algebra?

----

Before you read what I say below, note that I'm NOT against accessors in 
general.  Their use is unlikely to find its way into DenseMatrix64F related 
code...

The decision not to use accessors came after a lot of benchmarks and 
experiments.  The general conclusion was that in some situations the VM was 
smart enough to inline, but not always.  At one point I even started to switch 
part of the code base over to use accessors, but performance suddenly dropped 
in ways not predicted by micro benchmarks.  The VM seems to be much better at 
optimizing code which lacks abstraction and has been written so that it's 
simple enough for it to understand.

Original comment by peter.ab...@gmail.com on 20 Jan 2012 at 5:24

GoogleCodeExporter commented 9 years ago

I am familiar with multi-threaded matrix multiplication using transposition to 
optimised cache access. But I haven't done anything more complicated using 
matrices since Uni. ;)

Another optimisation I would consider is making it implicitly multi-threaded 
when it improves performance.

Original comment by peter.la...@gmail.com on 20 Jan 2012 at 5:38

GoogleCodeExporter commented 9 years ago

On a 4 core + hyper threading I have seen a 3-5x improvement in performance for 
some heavily mathematical operations.  This would be more valuable for massive 
matrices as you need a decent amount of work to make it worth using more than 
one thread.

Original comment by peter.la...@gmail.com on 20 Jan 2012 at 5:45

GoogleCodeExporter commented 9 years ago

direct ByteBuffers are portable.  Even work on Android.  I usually test using 
ByteBuffers first and then swap in Unsafe as there is a one-to-one mapping but 
without the safety checks.

Original comment by peter.la...@gmail.com on 20 Jan 2012 at 5:46

GoogleCodeExporter commented 9 years ago

A bigger problem is that Java matrices can only support 2^30 elements. I just 
ran into this problem, while trying to run a PCA over 119 samples with 165498 
dimensions each. Here is the stack trace:

Thread [main] (Suspended (exception ArrayIndexOutOfBoundsException))    
    DenseMatrix64F(D1Matrix64F).set(int, double) line: 96   
    CommonOps.setIdentity(RowD1Matrix64F) line: 788 
    BidiagonalDecompositionRow.getU(DenseMatrix64F, boolean, boolean) line: 184 
    BidiagonalDecompositionRow.getU(Matrix64F, boolean, boolean) line: 1    
    SvdImplicitQrDecompose.computeUWV() line: 212   
    SvdImplicitQrDecompose.decompose(DenseMatrix64F) line: 162  
    SvdImplicitQrDecompose.decompose(Matrix64F) line: 1 
    PrincipalComponentAnalysis.computeBasis(int) line: 127  

CommonOps.setIdentity is as follows: 

    public static void setIdentity( RowD1Matrix64F mat )
    {
        int width = mat.numRows < mat.numCols ? mat.numRows : mat.numCols;

        int length = mat.getNumElements();

        for( int i = 0; i < length; i++ ) {
            mat.set(i , 0 );
        }

        int index = 0;
        for( int i = 0; i < width; i++ , index += mat.numCols + 1) {
            mat.set( index , 1 );
        }
    }

Here, numCols == numRows == 165498, giving a total of 27,389,588,004 array 
elements. The value of length = mat.getNumElements() is 1,619,784,228. The 
value of index when it crashes is 1,619,904,212. The value of 2^30 is 
1,073,741,824.

It seems like there should be a way to generate U that doesn't rely upon 
generating such a large matrix? Ultimately the eigen decomposition matrix can't 
be larger than 119 x 119 anyway, since there are only 119 data samples.

Anyway, even representing a 1M x 1M matrix in Java is impossible until 
long-typed array indices are supported in Java (which probably won't happen 
because it breaks some fundamental things in the language).

As for how to efficiently implement matrix algorithms where the entire data 
structure doesn't fit in memory, see the comments on loop unrolling and 
blocking in the paper here (the comments on cache apply equally to swapping in 
and out from disk): 
http://homepages.cae.wisc.edu/~ece539/software/matrix/mbpv11.ps

Original comment by luke.hutch on 30 Mar 2012 at 4:50

GoogleCodeExporter commented 9 years ago

ByteBuffers have the same problem, they are limited to Integer.MAX_VALUE even 
though the underlying library in C are not.

The solution is to have multiple arrays or ByteBuffers. This allows you have 
arrays up to maximum heap size or ByteBuffers up to your disk space size.

Original comment by peter.la...@gmail.com on 30 Mar 2012 at 9:51

GoogleCodeExporter commented 9 years ago

Yep that is a way to get around it, but it is still annoying.  The most natural 
way to do that would be to have block matrices in matrix be composed of a 
double array instead of a single array.  Unfortunately I decided to use a 
single array in EJML since it seemed to be slightly faster and easier to work 
with.  In the future that might have to change.

However, this is probably a non issue since working with any matrix that large 
is likely to be way too slow.

Speaking of which Peter, do you think I should close this issue?

Original comment by peter.ab...@gmail.com on 2 Apr 2012 at 6:41

GoogleCodeExporter commented 9 years ago

I would close it. We can open another issue later I assume if we feel we need 
it.

Original comment by peter.la...@gmail.com on 2 Apr 2012 at 7:26

GoogleCodeExporter commented 9 years ago

Original comment by peter.ab...@gmail.com on 2 Apr 2012 at 9:34

Changed state: WontFix

weyns49570 / efficient-java-matrix-library

Support for huge matricies #23