sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.43k stars 478 forks source link

Module to load matrix from file #13601

Open fda5796b-398e-41cd-914c-35e067c67f51 opened 12 years ago

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago

Numpy and Scipy has functions to load matrix save in text files (Numpy: http://docs.scipy.org/doc/numpy/reference/routines.io.html and Scipy: http://docs.scipy.org/doc/scipy/reference/io.html).

Will be nice to have a Sage function to call some of this function as backend and to handle with more file formats.

In attachment trac_13601-matrix_io.patch was suggest the follow formats:

Load Matlab and IDL format was implemented as a backend of Scipy and MPS was implemented from scratch.

Load others suggest formats wasn't implemented. Neither the functions to write.

To improve this module to handle with any of the not implemented formats or any new format, please open other ticket.

Will be nice to implemented the suggest of vbraun.

CC: @sagetrac-Bouillaguet

Component: linear algebra

Issue created by migration from https://trac.sagemath.org/ticket/13601

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,23 @@
 Numpy and Scipy has functions to load matrix save in text files (Numpy: http://docs.scipy.org/doc/numpy/reference/routines.io.html and Scipy: http://docs.scipy.org/doc/scipy/reference/io.html).

 Will be nice to have a Sage function to call some of this function as backend and to handle with more file formats.
+
+In attachment trac_13601-matrix_io.patch was suggest the follow formats:
+* A single matrix in a text file without header or any other information.
+* Coordinate Text File.
+* Octave's binary data format.
+* Octave's binary data format.
+* hdf5 format.
+* The Matrix Market (MM) exchange formats.
+* The Harwell-Boeing format.
+* Matlab format.
+* IDL save file.
+* Mathematical Programming System (mps).
+
+Load Matlab and IDL format was implemented as a backend of Scipy and MPS was implemented from scratch.
+
+Load others suggest formats wasn't implemented. Neither the functions to write.
+
+**To improve this module, please open other ticket.**
+
+
fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago
comment:3

In the version 0.2 of the attachment: trac_13601-matrix_io.patch I make some improves in the definition of the functions and the documentation and write tests for all the functions.

IMHO, will be better to freeze this ticket and open others when write the not implemented functions.

vbraun commented 12 years ago
comment:4

Instead of adding new global symbols load_matrix / write_matrix, how about making them accessible through the matrix constructor? That is, if the first argument to matrix() is a string then call load_matrix(filename, **kwds). The write_matrix functionality should be wrapped into the save() method of matrices, this can then be discovered by the user with tab-completion.

A minor English nitpick, its either load/save or read/write.

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago
comment:5

Replying to @vbraun:

Instead of adding new global symbols load_matrix / write_matrix, how about making them accessible through the matrix constructor? That is, if the first argument to matrix() is a string then call load_matrix(filename, **kwds). The write_matrix functionality should be wrapped into the save() method of matrices, this can then be discovered by the user with tab-completion.

I like the suggestion.

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago

Description changed:

--- 
+++ 
@@ -18,6 +18,10 @@

 Load others suggest formats wasn't implemented. Neither the functions to write.

-**To improve this module, please open other ticket.**
+**To improve this module to handle with any of the not implemented formats or any new format, please open other ticket.**
+
+Will be nice to implemented the suggest of [vbraun](https://github.com/sagemath/sage/issues/13601#comment:4).

+
+
fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago

Attachment: trac_13601-matrix_io.patch.gz

Version 0.2

robertwb commented 12 years ago
comment:7

I would rather see matrix(str, ...) try to parse the string, so one could do

matrix("""
[1, 2]
[3, 4]
""")

And use matrix(open("/path/to/file", ...) for loading from a file.

vbraun commented 12 years ago
comment:8

Another possibility would be to turn matrix into a class with matrix.__call__ doing what we are doing now. Then one can add methods matrix.load, matrix.identity, ...

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago
comment:9

Replying to @robertwb:

I like the idea of see matrix(str, ...) try to parse the string. But I prefer to use matrix("/path/to/file", ...) rather than matrix(open("/path/to/file", ...), ...).

robertwb commented 12 years ago
comment:10

Replying to @sagetrac-r-gaia-cs:

Replying to @robertwb:

I like the idea of see matrix(str, ...) try to parse the string. But I prefer to use matrix("/path/to/file", ...) rather than matrix(open("/path/to/file", ...), ...).

But how would you disambiguate the two? It does seem odd to not be able to construct matrices from strings, as one can do ZZ("123") and SR("b^2 - 4a c") and ZZ['x']("x^3 - 3x + 1").

A huge +1 to Volker's idea. #13678

fda5796b-398e-41cd-914c-35e067c67f51 commented 12 years ago
comment:11

Replying to @robertwb:

Replying to @sagetrac-r-gaia-cs:

Replying to @robertwb:

I like the idea of see matrix(str, ...) try to parse the string. But I prefer to use matrix("/path/to/file", ...) rather than matrix(open("/path/to/file", ...), ...).

But how would you disambiguate the two?

IMHO, for the string to be parse it must be equal to the nice string representation of:

Once in all the four cases above the 0-position of the string is [ or { it can be used to disambiguate between the two.

Will be good too that one could do

matrix("""
1, 2
3, 4
""")

For this special case we can use the new line character to disambiguate.

A huge +1 to Volker's idea. #13678

A huge +2 to Volker's idea. #13678

robertwb commented 12 years ago
comment:12

I think it would be odd if ZZ(s) tried to open the file s if it didin't represent a valid integer (and/or refused to if it did). Same issue applies to matrices. IMHO, better to raise an error than have very different semantics here.

You could rebase this atop the #13678 (using matrix.load) to sidestep the whole issue.

vbraun commented 12 years ago
comment:13

To parse a string-matrix I think we should just match numbers and strip everything else out, as in

sage: number = re.compile('[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?')
sage: [ match.group() for match in number.finditer('[ 1e3, 0 | 3 6, 1.3 .123)') ]
['1e3', '0', '3', '6', '1.3', '.123']

Then put the numbers from each line into the rows of a matrix. This should grok all ascii-art matrix representations.

If the input string is already valid Python, say, then the user can just eval it directly. So we shouldn't worry too much about that case.

+1 to use matrix.load() to load the matrix.

89f39f15-88e8-4e79-9bc0-0739a7fc497c commented 11 years ago
comment:14

+1 to use matrix.load() to load the matrix !

This ticket should really be rebased on top of #13678 !

Minor problems I spotted : there are a lot of english mistakes in the patch, copyright notice should not say 2005 W.stein, but 2012 R.Silva.

Lastly, I find it weird to write a whole lot of boilerplate, especially in the (user-visible) documentation, about things that are not implemented. I think it could confuse users, and I personally think that it would be better if we took the pain of actually doing the work before merging the ticket. Or at least it could be rephrased "TODO : other formats that we could possibly handle in the future include...."

About the MatrixMarket format : as Raniere mentions, SciPy has functions to read/write this format. These functions could probably be used as-is for floating-point matrices, but as far as I understand what they do, they do not support integer matrices. And even if they would, they would be limited to 32/64 bits integers because they rely on NumPy. Therefore I tend to believe that an integer version of the MatrixMarket code would have to be re-implemented. We cannot patch SciPy to work with arbitrarily large integers... So I think that we should take SciPy's MatrixMarket code and improve it. I am willing to do this, but I'd rather be sure that we agree before doing it.

Oh, a last detail : the MatrixMarket format comes in a dense and in a sparse version, real, complex, integer or binary, thus it would seem natural that the loader itself sets the sparsity flag and the ring. In the patch they are given to the loader.