tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
334 stars 97 forks source link

Adds support for reading Matlab Objects from matfile V5 #64

Closed hernot closed 5 years ago

hernot commented 7 years ago

The official version is not able to read matlab files using matfile V5 format, which contain matlab varaibles eposing a variable type of MAT_C_OBJECT. Fields, arrays and cells of this type are reported to be emtpy.

From documentation of matfile format V5 variables,fields, arrays and cells of type MAT_C_OBJECT only fiffer from their MAT_C_STRUCT silblings only by an additional field in the variable entry header stating the name of the corresponding MATLAB class. This allows to implement the support for reading MAT_C_OBJECT type variables by using following strategy:

1) Wherever possible extend type checks such that operation to be

exectued on MAT_C_OBJECT variables is handled equal to MAT_C_STRUCT type 2) Keep MAT_C_OBJECT specific code as small as possible and resort to MAT_C_STRUCT path as soon as there is no difference between handling either.

The changes were originally implemented by this author for an initial release of matio 1.5.3, used by data conversion tool of g.tec (www.gtec.at) to allow reading data generated by the commercial MATLAB and Simulink based tools.

Implementation is based upon pull from git://git.code.sf.net/p/matio/matio on Sat May 27 13:21:32 2017

tbeu commented 7 years ago

Thanks for this PR. I merged your branch on top of recent master (and hopefully got the merge conflicts right).

hernot commented 7 years ago

Regarding class_name field of matvar_t moving to matvar_internal. Is ok for me as long as you advise me how an appropriate function should look like to access and print the name of the matlab class definition should look like when it is stored on matvar_internal.

Regarding the TODO's I have put them as i would need your review whether the if conditions would be sufficient or whether in the following code there would be any extra code necessary. For now it seems that the code as is would be ok. But i currentl do not have access to the data and facility to throughly test them, and verify that that these two reminder todos can be solved by deleting them, which would be the best case.

tbeu commented 7 years ago

OK, I will move class_name member and check the TODOs in more detail.

tbeu commented 7 years ago

This is what I tried:

  1. MATLAB commands

    inline_object = inline('t^2');
    memmap_object = memmapfile('c:\temp\mem.map');
    save object_uncompressed -v6 inline_object memmap_object
  2. MAT-File mat.zip

  3. Matdump command

    matdump -d -v object_uncompressed.mat
  4. matdump output https://gist.github.com/tbeu/f0b077df1c278a4687b8aa4b9640d967

Thus, it seems to work on the inline object, but not on the memmapfile object. What kind of objects did you build?

hernot commented 7 years ago

Hi

Am Freitag, den 16.06.2017, 07:41 -0700 schrieb tbeu:

This is was I tried:

  1. MATLAB commands

    inline_object = inline('t^2');
    memmap_object = memmapfile('c:\temp\mem.map');
    save object_uncompressed -v6 inline_object memmap_object
  2. MAT-File mat.zip

  3. Matdump command

    matdump -d -v object_uncompressed.mat
  4. matdump output https://gist.github.com/tbeu/f0b077df1c278a4687b8aa4b9640d967

Thus, it seems to work on the inline object, but not on the memmapfile object. What kind of objects did you build?

I created an m file describing the class object (old style @dir or new with classdef should not matter)

------ test_class.m

classdef test_class      properties (access=public)        firstprop;        secondprop:     end     methods         function obj=test_class()             obj.firstprop=12;             obj.secondprop=10;         end     end end


and than 

obj = test_class();

save('c:\temp\mem.map','obj')

With memmap file i havent yet worked in matlab, so i do not have any idea how memmap would have an influence on the file content. 

Best Xristoph

tbeu commented 7 years ago

I now took your test_class example (mat.zip), however I cannot see that matdump dumps the class object. Instead, when debugging it, I see that class_type is MAT_C_UINT8 for class_object or memmap_object (and MAT_C_OBJECT for inline_object as before). I believe this issue is related to #47 (MCOS). Can you please debug and check the expected behavior. Thanks.

hernot commented 7 years ago

Hi Am Samstag, den 26.08.2017, 03:53 -0700 schrieb tbeu:

I now took your test_class example (mat.zip), however I cannot see that matdump dumps the class object. Instead, when debugging it, I see that class_type is MAT_C_UINT8 for class_object or memmap_object (and MAT_C_OBJECT for inline_object as before). I believe this issue is related to #47 (MCOS). Can you please debug and check the expected behavior. Thanks.

Afaik it should be MAT_C_OBJECT, and the MAT_c_UINT8 i can check, eventhoug it may take some time until i get to. Do you have some testdata on github already. Than i would update my fork before checking, so that I'm in sync with your corresponding branch.

Best Xristoph

tbeu commented 7 years ago

You can take the MAT file from mat.zip or mat.zip as test data. I checked it on your plain commit , thus you could debug it with your local branch (no need to sync right now).

hernot commented 7 years ago

Ok checking the matlab documentation of memmapedfile i fear you can not memmap objects

From the documentation of the memmaped file Format parameter you can only specify numerical types no cells no char no struct and no object.

https://de.mathworks.com/help/matlab/ref/memmapfile.html

Thus could you please check (i do not have a matlab here,*) matdump reports the proper type for the following examples

1) char_object = 'Hello world'

memmap_object = memmapfile('c:\temp\mem_struct.map'); save object_uncompressed -v6 char_object memmap_object

2) struct_object=struct('Hello',{42},'World',{8*6});

memmap_object = memmapfile('c:\temp\mem_struct.map'); save object_uncompressed -v6 struct_object memmap_object

3) ------ TestClass.m ------

classdef TestClass

properties (access = public) a=1; b="Hello World" c end methods (access = public) function obj = TestClass(surprise) c = surprise; end end end

object_object = TestClass('So long. Goodby and thanks for all the fish'); memmap_object = memmapfile('c:\temp\mem_object.map'); save object_uncompressed -v6 struct_object memmap_object

(*) or can you please at least prepare the saved files for me so that i can use them for debuging

What does matdump report in both cases. If the explicit format specifier does not allow any thing else than numeric types, no char, no cell, no struct, no object, no complexl, than i do suspect that char, cell, struct and object are not supported either when no format is specified.

What does matlab say when you try to reopen the above examples with memmappedfile for reading and what when you use normal load? Do you get the char, struct and object back?

Which i fear is why they only show examples of numeric arrays and matrices to be mapped and no other data type.

Or matlab does some trick to store object as huge mat_c_uint8 type blob which it backconverts lateron. For classes it mandatorily would need the cassdef on the matlab path. To check and verify this hypothesis i would need at least the resulting saved files (memmap and normal).

tbeu commented 6 years ago

This is what I did again:

  1. Copy BasicClass.m
  2. MATLAB (R2015b x64 on Win7) commands
    class_object = BasicClass(pi/3)
    save object_uncompressed -v6 class_object
    save object_compressed -v7 class_object
    save object_hdf5 -v7.3 class_object % I know, not needed
  3. MAT files: mat.zip
  4. matdump command: matdump -d -v object_uncompressed.mat
  5. matdump output
    Empty
    0 1 73 77 0 0 0 0 14 0 0 0 152 2 0 0 6 0 0 0 8 0 0 0 2 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 5 0 4 0 5 0 0 0 1 0 0 0 5 0 0 0 77 67 79 83 0 0 0 0 14 0 0 0 80 2 0 0 6 0 0 0 8 0 0 0 17 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 4 0 77 67 79 83 1 0 0 0 13 0 0 0 70 105 108 101 87 114 97 112 112 101 114 95 95 0 0 0 14 0 0 0 16 2 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 4 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 232 0 0 0 6 0 0 0 8 0 0 0 9 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 184 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 184 0 0 0 2 0 0 0 2 0 0 0 64 0 0 0 96 0 0 0 96 0 0 0 144 0 0 0 168 0 0 0 184 0 0 0 0 0 0 0 0 0 0 0 86 97 108 117 101 0 66 97 115 105 99 67 108 97 115 115 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 14 0 0 0 56 0 0 0 6 0 0 0 8 0 0 0 6 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 9 0 0 0 8 0 0 0 101 115 45 56 82 193 240 63 14 0 0 0 168 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 56 0 0 0 6 0 0 0 8 0 0 0 2 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 0 4 0 1 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 56 0 0 0 6 0 0 0 8 0 0 0 2 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 0 4 0 1 0 0 0 1 0 0 0 0 0 0 0 14 0 0 0 136 0 0 0 6 0 0 0 8 0 0 0 9 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 88 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 88 0 0 0 0 1 73 77 0 0 0 0 14 0 0 0 72 0 0 0 6 0 0 0 8 0 0 0 2 0 0 0 0 0 0 0 5 0 0 0 8 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 5 0 4 0 5 0 0 0 1 0 0 0 5 0 0 0 77 67 79 83 0 0 0 0 14 0 0 0 0 0 0 0 
  6. matdump command: matdump -d -f whos object_uncompressed.mat
  7. matdump output
    
    Name                       Size           Bytes          Class             

(null) 0 (null)
1x824 824 mxUINT8_CLASS



Observations: object_compressed.mat gives the same output as object_uncompressed.mat. There are two variables in the MAT file: The "empty" object and the MCOS. (You can read MCOS if you open object_uncompressed.mat in a Hex editor.) This issue clearly is related to #40 and #47. @hernot Please have look at both issues.
coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 78.648% when pulling 20f0cfd1b643678afd36c3a4a71b1239cffa5ff9 on hernot:read-matlab5-object into 72dc1d7dc4e4412f6b880009c201c718b76cb7f5 on tbeu:master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 78.648% when pulling 20f0cfd1b643678afd36c3a4a71b1239cffa5ff9 on hernot:read-matlab5-object into 72dc1d7dc4e4412f6b880009c201c718b76cb7f5 on tbeu:master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 78.648% when pulling 20f0cfd1b643678afd36c3a4a71b1239cffa5ff9 on hernot:read-matlab5-object into 72dc1d7dc4e4412f6b880009c201c718b76cb7f5 on tbeu:master.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-1.9%) to 78.648% when pulling 20f0cfd1b643678afd36c3a4a71b1239cffa5ff9 on hernot:read-matlab5-object into 72dc1d7dc4e4412f6b880009c201c718b76cb7f5 on tbeu:master.

hernot commented 6 years ago

Hi Am Sonntag, den 22.10.2017, 01:43 -0700 schrieb tbeu:

This is what I did again:

  1. Copy BasicClass.m
  2. MATLAB (R2015b x64 on Win7) commands
    class_object = BasicClass(pi/3)
    save object_uncompressed -v6 class_object
    save object_compressed -v7 class_object
    save object_hdf5 -v7.3 class_object % I know, not needed
  3. MAT-File: mat.zip
  4. matdump command: matdump -d -v object_uncompressed.mat

Thank you very much i will have a look to it asap.

Best Xristoph

tbeu commented 6 years ago

No rush. I believe it is more complicated, see http://nbviewer.jupyter.org/gist/mbauman/9121961.

hernot commented 6 years ago

Hm fuck Am Sonntag, den 22.10.2017, 13:13 +0000 schrieb tbeu:

No rush. I believe it is more complicated, see http://nbviewer.jupyte r.org/gist/mbauman/9121961.

What matlab version are you using < 2014a/2016a, 2016b or newer

Is there any chance you could run the tests in 2014a or 2016a and check if it still writes a strange opaque class.

Best Xristoph

-- Christoph, Elisabeth Hintermüller Eisenhandstraße 33/23 4020 Linz Austria Tel.: +43 650 8827347 mail: christoph@out-world.com mail: elisabeth@out-world.com www: http://www.out-world.com

tbeu commented 6 years ago

I told you: R2105b. I also have R14SP3 (from 2005).

hernot commented 6 years ago

Hi Am Sonntag, den 22.10.2017, 13:13 +0000 schrieb tbeu:

No rush. I believe it is more complicated, see http://nbviewer.jupyte r.org/gist/mbauman/9121961.

And when trying to recreate in older version can you also try to load the ones you created in your standard matlab installation in the older version, of if oder matlabs choak on the file too even if they already support V5/V6 file format.

Best Xristoph

hernot commented 6 years ago

R14 may be to old to know objects at all and between 2015a and 16a they already have changed much and in company we decided to skip 17a, which would be comparable to. And i developed the whole on 2014a, so it could be that they either have introduce the changes with 2015b which would explain why we had problems in loading data generated by 2016 release.

Am Sonntag, den 22.10.2017, 08:50 -0700 schrieb tbeu:

I told you: R2105b. I also have R14SP3 (from 2005).

Any chance to find some one who could save the object on 2013a or 2014a.

Cause if so they broke their own save for version V7 and V6 as they are not obeying their own documentation https://www.mathworks.com/help/pdf_ doc/matlab/matfile_format.pdf Where they still document the format as i have implemented.  Or was there in the past a -V5 save option?

Further can you create a second class and safe both in one file? Maybe this gives some further hint on the pattern how classes are stored in V6 and V7 compared to V5

Best Xristoph

tbeu commented 6 years ago

Any chance to find some one who could save the object on 2013a or 2014a.

I think I can manage, but not as of today.

tbeu commented 6 years ago

Cause if so they broke their own save for version V7 and V6 as they are not obeying their own documentation https://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf

I doubt, The MathWorks broke anything. matfile_format_R11.pdf matfile_format_R14SP3.pdf matfile_format_R2007a.pdf matfile_format_R2007b.pdf matfile_format_R2008a.pdf matfile_format_R2008b.pdf matfile_format_R2009a.pdf matfile_format_R2009b.pdf matfile_format_R2010a.pdf matfile_format_R2010b.pdf matfile_format_R2011a.pdf matfile_format_R2011b.pdf matfile_format_R2012a.pdf matfile_format_R2012b.pdf matfile_format_R2013a.pdf matfile_format_R2013b.pdf matfile_format_R2014a.pdf matfile_format_R2014b.pdf matfile_format_R2015a.pdf matfile_format_R2016a.pdf matfile_format_R2017b.pdf

tbeu commented 6 years ago

http://mathforum.org/kb/message.jspa?messageID=9828124 reports about a new graphics system introduced in R2014b (which also affected the saving). Maybe it is related to your development for R2014a and my oberservations with R2015b.

hernot commented 6 years ago

Hi  Am Donnerstag, den 09.11.2017, 10:06 +0000 schrieb tbeu:

http://mathforum.org/kb/message.jspa?messageID=9828124 reports about a new graphics system introduced in R2014b (which also affected the saving). Maybe it is related to your development for R2014a and my oberservations with R2015b.

Yes that is my fear too ;-) In my former company it was a huge work to keep things running.

I think i have some Ideas (so no time courrently) how to handle opaque objects. And the source you sent is wrong. It looks rather like a hard coded fixed size struct/object for which the array size and some other tag fields are not explicitly recorded with. The last field seems to embed a byte stream crated by  getByteStreamFromArray(anyData) and readable by getArrayFromByteStream(byteStream) serzialization functions. That field starts with 12 bytes looking like the bytes 500 to 511 of the Mat file header. with the last two bytes reading IM or MI looking like the endianess indicator which in the header is located at byte 511 and 512. the 8 bytes preceeding the the four containing these two exactly encode the remaining number of bytes in the mat file. I still have to implement and test but just from the Hex Editor it looks plausible.

Xristoph  

tbeu commented 6 years ago

I get the same results as https://github.com/tbeu/matio/pull/64#issuecomment-338461338 if I export the MAT-files in R2014a instead of R2015b.

MAT files: mat.zip

hernot commented 6 years ago

Hi

Thank you very much.

Am Mittwoch, den 15.11.2017, 12:00 +0000 schrieb tbeu:

I get the same results as https://github.com/tbeu/matio/pull/64#issue comment-338461338 if I export the MAT-files in R2014a instead of R2015b.

Seems as if Matlab stores old style classes/objects defined via class directories @classname differentenly than newstyle classes defined via classdef. The ones we used in the old company were old style objects. Anyway. 

Can you upload all your results including Matlab 2014a and Can you do me the favour and serialize a simple matrix using the getByteStreamFromArray(anyData) as follows

------------------ snip -------------------

% bests would be if anymatrix would reference the data inside our test % class(es) without the class decoration instead of just the randi  % matrix below  anymatrix = randi(5,5)

% if the b for force binary is not recognized any more remove the b fid = fopen('serializeddata.bin','wb') 

if fid >= 0     fwrite(fid,getByteStreamFromArray(anymatrix))     fclose(fid) end


And send me the result. That would help and simplify to read and interpret the opaque class structure allowing to extract new style class data from mat V5  files. even though not officially documented, i bet it is not much different, excempt removed tags/byte length which seems to be hardcoded and thus implicitly known by Matlab ;-)

Best Xristoph

    -- Christoph, Elisabeth Hintermüller Eisenhandstraße 33/23 4020 Linz Austria Tel.: +43 650 8827347 mail: christoph@out-world.com mail: elisabeth@out-world.com www: http://www.out-world.com

tbeu commented 6 years ago

Seems as if Matlab stores old style classes/objects defined via class directories @classname differentenly than newstyle classes defined via classdef. The ones we used in the old company were old style objects. Anyway.

Should I rather try to save old style object?

hernot commented 6 years ago

Hi  Am Donnerstag, den 16.11.2017, 10:08 +0000 schrieb tbeu:

Seems as if Matlab stores old style classes/objects defined via class directories @classname differentenly than newstyle classes defined via classdef. The ones we used in the old company were old style objects. Anyway. Should I rather try to use old style object?

Hm not sure if they havent changed their storage too, as, as far as i remember, there was an error about invalid parameters for all when loading old style objects in afiak Matlab 2014a or 2015a, not shure which version. And it is likely they now also store the inside the opaque object, as in 2016 a there this error we did not observer any more. 

Si i suggest to wait until somebody complaiins about not beeing able to load them form old data < 2014a and now try to figure if consistent loading of objects from opaque object is possible.

Best Xristoph

tbeu commented 6 years ago

Is this still being worked on? If not, I propose to close it w/o merge?

hernot commented 6 years ago

Hi 

Inline Response (answers below questions asked) Message sent on Samstag, 03.02.2018, 15:25 +0000 by tbeu:

Is this still being worked on? If not, I propose to close it w/o merge?

I hasd to postpone it the last month and the obpaque object, which is suspect is a hardcoded matrix/struct which carries as payload a matlab serialization string as posted above is still pending. I would need some files to continue on this as stated in my earlier posts. But i do not have any idea when i will have time to continue, it is not on higest prio for now. So if you could generated the files for me and sedn them i would be ok with closing afterwards and sending a new one when finally solved the object riddle. 

LG Xristoph

tbeu commented 6 years ago

One year later: can we close it?