sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.73k stars 125 forks source link

utf8upr/lwr size issues? #106

Open ghost opened 1 year ago

ghost commented 1 year ago

Hi, I was looking at the docs for utf8upr/lwr, and they don't seem to indicate what happens if the string passed to them doesn't have enough space for the new codepoints. I understand that letters may have different byte sizes in their upper/lowercase variants, so I was wondering whether utf8upr/lwr will allocate extra memory as required.

Looking at the code, though, it seems like they just call utf8catcodepoint, which AFAIK doesn't allocate additional memory. In fact, the size argument in that call is set to the size of the new codepoint, rather than the size of the buffer as it should be. Is this correct?

sheredom commented 1 year ago

So utf8upr and utf8lwr rely on the only codepoints we currently support for them are all symmetrically sized - their replacements are the same size. If that ever changed we'd be scunnered!

ghost commented 1 year ago

@sheredom thanks for the response. Is this documented anywhere? If not, it definitely should.

Also, what happens with the size argument to utf8catcodepoint? Is it correct that we pass the size of the new codepoint instead of the buffer's?

sheredom commented 1 year ago

It isn't documented, so I'll do a PR. I think the size is fine only because all our replacements the size is the same between the original and the new!