lessthanoptimal / ejml

A fast and easy to use linear algebra library written in Java for dense, sparse, real, and complex matrices.
https://ejml.org
561 stars 117 forks source link

MatrixIO.loadMatrixMarketD crashes on non-coordinate matrices #165

Closed alugowski closed 1 year ago

alugowski commented 1 year ago

The following is a valid MatrixMarket file representing a 3x1 column vector:

%%MatrixMarket matrix array real general
%
3 1
1
2
3

Trying to load this with MatrixIO.loadMatrixMarketD:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Unexpected number of words: 2
    at org.ejml.ops.MatrixIO.loadMatrixMarketD(MatrixIO.java:181)

The problem appears to be that the loader assumes a coordinate format, but notice the header says array.

alugowski commented 1 year ago

The way Eigen tackles this is they have two methods. One reads coordinate files as a sparse matrix and another array files as a dense matrix.

See https://eigen.tuxfamily.org/dox/unsupported/MarketIO_8h_source.html

I think it's reasonable to read array files into a DMatrixRMaj.

lessthanoptimal commented 1 year ago

On the same branch it now supports vectors. Hmm in the sparse case only. I'll add a function for the dense case.

lessthanoptimal commented 1 year ago

There's now loadMatrixMarketD() and loadMatrixMarketDenseD(). Coordinate and dense matrices are supported for both MM file types. Which approach do you like better? Eigen or this?

alugowski commented 1 year ago

I like this. If I want a sparse matrix, I can load any MM file via loadMatrixMarketD() and get a sparse matrix. If I want a dense matrix, I can load any MM file via loadMatrixMarketDenseD() and get a dense matrix. Sounds simple to me. I think it's the best that can be done in a statically typed language.

Maybe one optimization might be for loadMatrixMarketD() to ignore zeros from an array MM file.

lessthanoptimal commented 1 year ago

Just did another tweak. loadMatrixMarket format is now used, where is the abreviated matrix type. DDRM for DMatrixRMajor or DSTR for DMatrixSparseTriplet.

I'll add the ignore zeros in dense right now

lessthanoptimal commented 1 year ago

In coordinate format, would it be bad if zeros were ignored also? I suspect that I shouldn't do that since they might be encoding something with the structure... just easier to code that way.

alugowski commented 1 year ago

In coordinate format, would it be bad if zeros were ignored also? I suspect that I shouldn't do that since they might be encoding something with the structure... just easier to code that way.

I would follow your general policy on explicit zeros. If you allow them then the loader shouldn't drop them. If it's a question of code reuse, maybe add a flag that controls the dense loader's drop behavior too?

lessthanoptimal commented 1 year ago

I'm happy with the current code. I'll close this ticket soon unless you have some other thoughts.

alugowski commented 1 year ago

saveMatrixMarket(FMatrixRMaj) and DMatrixRMaj appear to specify coordinate when it should be array.

I find SciPy has a good MatrixMarket IO module. Make sure you can load what they emit and vice versa.

lessthanoptimal commented 1 year ago

Just fixed that. Maybe that header shouldn't be completely ignored.. The format type is determined by inspecting the file.

lessthanoptimal commented 1 year ago

now reads the header and unit tests would catch this that header issue in the future.

alugowski commented 1 year ago

Just fixed that. Maybe that header shouldn't be completely ignored.. The format type is determined by inspecting the file.

Yes definitely don't ignore the header. Even if it's only a fail-fast mechanism. Also check for field=complex and say you don't support it.

lessthanoptimal commented 1 year ago

yep did both of those in yesterdays push.