Closed rlpowell closed 4 years ago
The fix_utf8 function at https://gist.github.com/w-vi/67fe49106c62421992a2 if given a buffer and its length will return the longest length that consists solely of UTF-8 characters, excluding any partial character at the end of the buffer. That should solve the problem.
Turns out to be not related to the %-25s issue it all; "%-25s" will only lengthen a string, not trim it. The actual issue is:
int main (int argc, char **argv) {/*{{{*/
char buffer[128];
char *start[256], **pstart;
I do not have any interest in the effort required to fix this properly, so I'm just pushing a bunch of bigger char arrays.
Specifically: the problem is that in the utf-8 string in question, breaking it into 128 byte chunks isn't on a utf-8 character boundary.
Behold the following bizarre mess:
However, if you trim the string by one character, in either direction, it's fine:
How the hell does that work?, I hear you cry?
Once I realized that it was tied to length, it occurred to me that vlatai probably doesn't output the entire string, and in morf.c we have:
, and another similar %-25s line just below it.
Unfortunately, I haven't the slightest idea how to make this safe in C, besides just not trimming the input at all.