Closed curoles closed 11 months ago
Happy to accept a PR for this if you are willing!
happy to create PR https://github.com/sheredom/utf8.h/pull/110
This is a scary bug - is it going to be merged in?
The linked PR fails CI tests and @curoles said they would look at the results. There hasn't been any follow-up from there.
Sorry, I got distracted and forgot to leave a comment. See https://github.com/sheredom/utf8.h/pull/110. Test UTEST(utf8ncpy, truncated_copy_null_terminated)
fails. Maybe someone with better understanding of what this test is expecting should have a look.
I believe I fixed one more issue when destination string it too small to store codepoint plus null terminator. All tests are passing now, nevertheless, check if my last fix makes sense.
Good good! Thanks for the fix, I always appreciate it when people chip in and make the project better!
In the code:
For code points >7F
0xc0
is valid mask for 1st byte, for rest it is0x80
(https://en.wikipedia.org/wiki/UTF-8).Consider string
°¯\_(ツ)_/¯°
let's add a printout:
Following after that code will chop last valid code point:
FIX
Fix that worked for me:
(index - check_index) < utf8codepointcalcsize(&d[check_index]))
The problem with using
utf8codepointsize
:is that
c2
becomesffffffc2
and none of the0xffffxxxx & chr
== 0