tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
330 stars 97 forks source link

UTF-8 String in MAT_FT_MAT73 #201

Closed divyesh19399 closed 8 months ago

divyesh19399 commented 1 year ago

I assume there's still some stuff going wrong with unicode characters while writing them to .mat files using MAT_FT_MAT73.

For example : - When I'm trying to write Omega symbol in the .mat files, I can't see the symbol for that in .mat file

------------------------Creating a file----------------------------------------------

mat_t *pmat2=Mat_CreateVer("test_73.mat",NULL,MAT_FT_MAT73);

--------------------------Creating a matvar_t variable-------------------------

static const QString qstr = QString(QChar(0x03A9));
std::string dataString= ("Gen Imp Drop ("+ qstr +")").toStdString();
size_t sz[2] = { 1, dataString.size() };
matvar_t *matString=Mat_VarCreate("charArray",MAT_C_CHAR,MAT_T_UTF8,2,sz,(void *)dataString.c_str(),0);

And the output when I print this variable using Mat_VarPrint() is as follows:

Name: charArray
Rank: 2
Dimensions: 1 x 17
Class Type: Character Array
Data Type: 16-bit, unsigned integer
{
Gen Imp Drop ()
}

And when I write this string in file version having value as MAT_FT_MAT5, it gives me correct output. Can someone please help me debug this? Is there something I am missing here? @tbeu Thanks in advance

Originally posted by @divyesh19399 in https://github.com/tbeu/matio/issues/189#issuecomment-1499240181

tbeu commented 1 year ago

Can you please create a minimal compilable example without the QString/QChar dependency? Thanks.

tbeu commented 11 months ago

Can you please create a minimal compilable example without the QString/QChar dependency? Thanks.

@divyesh19399 Friendly reminder.

tbeu commented 8 months ago

The following code works for me as expected:

#include <string>
#include "matio.h"

int main() {
    mat_t *pmat1 = Mat_CreateVer("test_5.mat", NULL, MAT_FT_MAT5);
    mat_t *pmat2 = Mat_CreateVer("test_73.mat", NULL, MAT_FT_MAT73);

    const char qstr[] = "\xCE\xA9"; // UTF-8 representation of the Greek letter Omega (0x03A9)
    std::string dataString = "Gen Imp Drop (" + std::string(qstr) + ")";

    size_t sz[2] = { 1, dataString.size() };
    matvar_t *matString = Mat_VarCreate("charArray", MAT_C_CHAR, MAT_T_UTF8, 2, sz, (void *)dataString.c_str(), 0);

    Mat_VarPrint(matString, 1);
    Mat_VarWrite(pmat1, matString, MAT_COMPRESSION_NONE);
    Mat_VarWrite(pmat2, matString, MAT_COMPRESSION_NONE);

    Mat_VarFree(matString);
    Mat_Close(pmat1);
    Mat_Close(pmat2);

    return 0;
}

and generates the following two files.