tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
334 stars 97 forks source link

Problem Reading Char array back from Mat File in matio-1.5.12 #88

Closed Zwulf87 closed 6 years ago

Zwulf87 commented 6 years ago

Hello,

currently I'm creating a matvar like this (simplified example)

std::string structName      ("Structure");
std::string structFieldName ("CharArray");
std::string testString      ("test");

size_t dims[2] = {1, 1};

matvar_t * mvMainStruct = nullptr;
matvar_t * mvCharArray = nullptr;

dims[0] = testString.size();
mvCharArray = Mat_VarCreate(nullptr, MAT_C_CHAR, MAT_T_UTF8, 2, dims, testString.data(), 0);

dims[0] = structName.size();
mvMainStruct = Mat_VarCreateStruct(structName.data(), 2, dims, nullptr, 0);

Mat_VarAddStructField(mvMainStruct, structFieldName.data());
Mat_VarSetStructFieldByName(mvMainStruct, structFieldName.data(), 0, mvCharArray);

This leeds to a structure with 1 field holding a char array. This is stored to a mat file and later read back from it. And there is the problem:

With 1.5.11, reading the CharArray back, its data_type was MAT_T_UTF8 as expected. Now, having 1.5.12, it is detected as MAT_T_UTF16 type.

opening the mat file in matlab, matlab says

>> Structure.CharArray

ans =
        4x1 char array 
            't'
            'e'
            's'
            't'

Debugging show's, that CharArray content is hold in RAM as ['t', '', 'e', '', 's', '', 't', ''] after reading from mat file.

So I'm wondering, if (MAT_C_CHAR, MAT_T_UTF8) is always stored as 16 byte wide type and reading it back is correct now. Or is this a bug in matio? (I would expect if MAT_T_UTF8 is written, MAT_T_UTF8 also comes back.)

tbeu commented 6 years ago

This is related to #79. There is one information missing here, the MAT file version you write to file (and if compression ist set).

But it should be correct as it is with v1.5.12.

Zwulf87 commented 6 years ago

The used mat file format ist 7.3. The described behaviour goes for compressed and non-compressed mat variables the same.

tbeu commented 6 years ago

I run your code to create test.mat both with v1.5.11 and v1.5.12. If you dump the MAT files using h5dump you see that the content is identical and that the datatype of /#refs#/0 is H5T_STD_U16LE. If you dump the MAT files using matdump you see that matdump v1.5.11 prints 8-bit, unsigned integer as datatype of CharArray, but matdump v1.5.12 prints 16-bit, unsigned integer and thus matches the H5T_STD_U16LE datatype.

I accept that fa95add9eeef8d3215f1de210cd4f0e490b677d4 breaks your expectations but still see it as a fix.