Open ec1oud opened 8 years ago
Also the API change broke my existing code, because this is C - the compiler is perfectly happy to pass an int (the decoded character) as a char *, and then it's an invalid pointer. I was using a version from a year or so ago, tried to upgrade, and boom.
What do you mean when you say lcd.c
is wrong ?
The examples which go through a char * UTF-8 string byte-by-byte (like lcd.c) are wrong, and the API is not conducive to fixing it. If you start with a UTF-8 string, you need to iterate by Unicode codepoints, not by bytes, but there is no function which tells you where the next one is.
The examples in itself are correct as they don't contain Unicode codepoints above U+007F. Therefor is no need to use special UTF-8 functions. If you need the length of a single UTF8 codepoint in bytes you can use utf8_surrogate_len
declared in utf8-utils.h
.
So, replacing wchar with char * wasn't a great move […]
wchar_t
isnt able to hold Unicode codepoints beyond the BMP on some targeted platforms so it unsuited for the task of passing the code point identity. Even uint32_t is unsuited for the task if you take codepoint variants into account. So passing an UTF-8 encoded char pointer is the only suitable ways of passing enough information into the function and keeping the API interface simple without breaking the API again in the future.
this function's name tells you that its job is to get one glyph, not to decode the prefix of a UTF-8 string.
Please be precise: what function are you talking about?
And an application which wants to step through the Unicode characters will decode it before calling this function, so it doesn't make sense for the function to turn around and decode it again. It should take a pre-decoded Unicode character.
Why should decoding be required in the first place? Unicode processing can be done without decoding the codepoints at all.
I was using a version from a year or so ago, tried to upgrade, and boom.
I'm sorry that the code change caused trouble for you.
As you can see just this project hasn't any versioning or changelog going on. The reasons for that are:
So introducing some version schema in the current state of the code would force us to guarantee some API or even ABI stability we can't provide yet.
Please be precise: what function are you talking about?
Scratch that. I ignored the title. Well, the texture_font_get_glyph
does exactly what it advertises, it returns the glyph for a unicode codepoint (encoded as UTF8), nothing more. It just happens that you can conveniently pass the pointer pointing to the start of an UTF-8 encoded unicode codepoint without doing any conversion beforehand (e.g. what you call decoding the prefix of a UTF-8 string
).
I think 32 bits ought to be enough for a "character" type, at least most of the time, but OK... edge cases are endless. Lemme guess, 64 bits isn't enough either?
How about a way to find out how many bytes it consumed in the process of rendering the next glyph then? E.g. take a char* argument and modify it to point to the next byte past the end of the glyph that it found, or take an int \ and return the count of bytes consumed. That way a simple loop could get all glyphs in a string of unknown length, deal with control characters, and prepare the vertex buffer all at the same time.
lcd.c has only an English string so far, but bytewise iteration is not a good example of how to actually use the library.
I think 32 bits ought to be enough for a "character" type, at least most of the time, but OK... edge cases are endless. Lemme guess, 64 bits isn't enough either?
The problem with using a uint32_t parameter is the missing context (you can't read the next character, which may modify the representation of the next character). Using a pointer to a uint32_t however would work again because it provides context. But using a uint32_t parameter makes unnecessary hard to use string literals.
How about a way to find out how many bytes it consumed in the process of rendering the next glyph then?
You seem to misunderstand. texture_font_get_glyph
doesn't consume anything.
The following code snipplet probably does what you're trying to achieve.
// create a texture_font_t instance `the_font` before that.
char* the_string = "私はガラスを食べられます。 それは私を傷つけません";
for(size_t i = 0; the_string[i] != '\0'; i += utf8_surrogate_len(the_string + i))
{
texture_glyph_t the_glyph = texture_font_get_glyph(the_font, the_string + i);
// further processing here
}
The examples which go through a char * UTF-8 string byte-by-byte (like lcd.c) are wrong, and the API is not conducive to fixing it. If you start with a UTF-8 string, you need to iterate by Unicode codepoints, not by bytes, but there is no function which tells you where the next one is. So, replacing wchar with char * wasn't a great move: this function's name tells you that its job is to get one glyph, not to decode the prefix of a UTF-8 string. And an application which wants to step through the Unicode characters will decode it before calling this function, so it doesn't make sense for the function to turn around and decode it again. It should take a pre-decoded Unicode character.
My reason is that I want to look for formatting characters like tabs and newlines as I go. But that's not the only reason.