Closed GoogleCodeExporter closed 8 years ago
This sounds interesting. Adding support for memory mapped files is not
something that can be simply plugged in. EJML gets much of its performance by
working with raw arrays and has little abstraction inside of its low level
functions. It shouldn't be too difficult to copy and rewrite some of these
algorithms to use accessors instead.
The hard part is figuring out how to make it fast. Processing these matrices
by blocks I suspect would be much faster than as row/column major. Not all
operators support block matrices yet.
Before I say too much more, by "interesting in adding support for" do you mean
you personally would like to write the code or it would be great if I wrote the
code?
Original comment by peter.ab...@gmail.com
on 20 Jan 2012 at 4:45
I mean I would be interested in writing the code. Using accessor methods don't
have impact performance provided they are simple enough to be inlined.
Using the Unsafe can bypass some of the bounds checks and give better
performance than using ByteBuffer, but can limit portability. (another reason
to abstract the use of this class) It also has the advantage that it can
allocate memory of any size (not limited by Integer.MAX_VALUE)
Original comment by peter.la...@gmail.com
on 20 Jan 2012 at 4:59
You referring to sun.misc.Unsafe? Just learned about that undocumented
functionality right now. I have also read that OpenJDK has support for unsafe
array access, but that other JDK's do not support that.
Probably a good feasibility experiment would be to implement matrix
multiplication for row major and block matrices. See which one is faster and
how long it would take to multiply matrices. How familiar are you with the
computational side of linear algebra?
----
Before you read what I say below, note that I'm NOT against accessors in
general. Their use is unlikely to find its way into DenseMatrix64F related
code...
The decision not to use accessors came after a lot of benchmarks and
experiments. The general conclusion was that in some situations the VM was
smart enough to inline, but not always. At one point I even started to switch
part of the code base over to use accessors, but performance suddenly dropped
in ways not predicted by micro benchmarks. The VM seems to be much better at
optimizing code which lacks abstraction and has been written so that it's
simple enough for it to understand.
Original comment by peter.ab...@gmail.com
on 20 Jan 2012 at 5:24
I am familiar with multi-threaded matrix multiplication using transposition to
optimised cache access. But I haven't done anything more complicated using
matrices since Uni. ;)
Another optimisation I would consider is making it implicitly multi-threaded
when it improves performance.
Original comment by peter.la...@gmail.com
on 20 Jan 2012 at 5:38
On a 4 core + hyper threading I have seen a 3-5x improvement in performance for
some heavily mathematical operations. This would be more valuable for massive
matrices as you need a decent amount of work to make it worth using more than
one thread.
Original comment by peter.la...@gmail.com
on 20 Jan 2012 at 5:45
direct ByteBuffers are portable. Even work on Android. I usually test using
ByteBuffers first and then swap in Unsafe as there is a one-to-one mapping but
without the safety checks.
Original comment by peter.la...@gmail.com
on 20 Jan 2012 at 5:46
A bigger problem is that Java matrices can only support 2^30 elements. I just
ran into this problem, while trying to run a PCA over 119 samples with 165498
dimensions each. Here is the stack trace:
Thread [main] (Suspended (exception ArrayIndexOutOfBoundsException))
DenseMatrix64F(D1Matrix64F).set(int, double) line: 96
CommonOps.setIdentity(RowD1Matrix64F) line: 788
BidiagonalDecompositionRow.getU(DenseMatrix64F, boolean, boolean) line: 184
BidiagonalDecompositionRow.getU(Matrix64F, boolean, boolean) line: 1
SvdImplicitQrDecompose.computeUWV() line: 212
SvdImplicitQrDecompose.decompose(DenseMatrix64F) line: 162
SvdImplicitQrDecompose.decompose(Matrix64F) line: 1
PrincipalComponentAnalysis.computeBasis(int) line: 127
CommonOps.setIdentity is as follows:
public static void setIdentity( RowD1Matrix64F mat )
{
int width = mat.numRows < mat.numCols ? mat.numRows : mat.numCols;
int length = mat.getNumElements();
for( int i = 0; i < length; i++ ) {
mat.set(i , 0 );
}
int index = 0;
for( int i = 0; i < width; i++ , index += mat.numCols + 1) {
mat.set( index , 1 );
}
}
Here, numCols == numRows == 165498, giving a total of 27,389,588,004 array
elements. The value of length = mat.getNumElements() is 1,619,784,228. The
value of index when it crashes is 1,619,904,212. The value of 2^30 is
1,073,741,824.
It seems like there should be a way to generate U that doesn't rely upon
generating such a large matrix? Ultimately the eigen decomposition matrix can't
be larger than 119 x 119 anyway, since there are only 119 data samples.
Anyway, even representing a 1M x 1M matrix in Java is impossible until
long-typed array indices are supported in Java (which probably won't happen
because it breaks some fundamental things in the language).
As for how to efficiently implement matrix algorithms where the entire data
structure doesn't fit in memory, see the comments on loop unrolling and
blocking in the paper here (the comments on cache apply equally to swapping in
and out from disk):
http://homepages.cae.wisc.edu/~ece539/software/matrix/mbpv11.ps
Original comment by luke.hutch
on 30 Mar 2012 at 4:50
ByteBuffers have the same problem, they are limited to Integer.MAX_VALUE even
though the underlying library in C are not.
The solution is to have multiple arrays or ByteBuffers. This allows you have
arrays up to maximum heap size or ByteBuffers up to your disk space size.
Original comment by peter.la...@gmail.com
on 30 Mar 2012 at 9:51
Yep that is a way to get around it, but it is still annoying. The most natural
way to do that would be to have block matrices in matrix be composed of a
double array instead of a single array. Unfortunately I decided to use a
single array in EJML since it seemed to be slightly faster and easier to work
with. In the future that might have to change.
However, this is probably a non issue since working with any matrix that large
is likely to be way too slow.
Speaking of which Peter, do you think I should close this issue?
Original comment by peter.ab...@gmail.com
on 2 Apr 2012 at 6:41
I would close it. We can open another issue later I assume if we feel we need
it.
Original comment by peter.la...@gmail.com
on 2 Apr 2012 at 7:26
Original comment by peter.ab...@gmail.com
on 2 Apr 2012 at 9:34
Original issue reported on code.google.com by
peter.la...@gmail.com
on 20 Jan 2012 at 9:20