Closed S-Dafarra closed 3 years ago
It also causes segfaults in case the input data pointer is nullptr
. This is acceptable for every other data type, as long there is a zero dimension. With MAT_C_CHAR
and MAT_T_UTF8
instead, it will try to access data
anyway.
The use of
strlen
needs the data to have a\0
char at the end. This is ok when the input data is a C string, but it causes segfaults when the input type is a char array.
That's why I set the trailing zero also for char array here: https://github.com/tbeu/matio/blob/84c8f3d67ef7493b3541ffe16fdfe007a45f3165/test/test_mat.c#L1302-L1303
Do you see any other option how to get the string length (which can be different from actual dimension)?
It also causes segfaults in case the input data pointer is nullptr.
Sure, this needs to be avoided.
It also causes segfaults in case the input data pointer is nullptr.
Addressed by fa770b2.
Addressed by fa770b2.
Thank you for the quick fix!
Do you see any other option how to get the string length (which can be different from actual dimension)?
I have been thinking about it a bit. If I understood correctly the issue reported in https://github.com/tbeu/matio/issues/164, the problem is related to strings (and also arrays) containing UTF8 chars. Since the UTF8 chars require additional bytes to be stored, it is not true that the number of bytes is equal to the size of the string. In this case, the use of strlen
fixed the issue since it counts the number of bytes correctly.
Ultimately, the problem seems to be on the correct counting of bytes then, both in the case of strings and arrays, but strlen
requires the \0
to be present.
From https://github.com/tbeu/matio/issues/164 it also seems that the dimensions need to be set to the actual number of chars, not to the number of bytes. So, both in the case of strings and arrays, the user is supposed to provide the number of elements he/she wants to store.
I wonder if information this can be exploited in this case, i.e. mimicking how strlen
works. In practice, it would be necessary to iterate over the number of elements, detect if the element is a plain char or a special UTF character, and count the bytes accordingly.
I was thinking maybe at something similar to https://stackoverflow.com/a/1031773, but with a different while loop condition (since while(*bytes)
would require again the terminal \0
) :thinking:
Fixed by 4e6cc2e. I guess I need to a release a new version soon.
Awesome, thanks!
In https://github.com/tbeu/matio/commit/67a922f83467d694fa6e9759ac9a30b6ab82aec4 the following two lines have been added https://github.com/tbeu/matio/blob/67a922f83467d694fa6e9759ac9a30b6ab82aec4/src/mat.c#L1116-L1118
The use of
strlen
needs the data to have a\0
char at the end. This is ok when the input data is a C string, but it causes segfaults when the input type is a char array.