sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.71k stars 122 forks source link

exit in `utf8ncmp` if over limit before reading #57

Closed bitonic closed 5 years ago

bitonic commented 5 years ago

this makes the function safe on strings that are not null-terminated.

sheredom commented 5 years ago

Hey there - thanks for the contribution! Could you please add a test case that would have failed without this change? Just so that it never regresses again in the future.

bitonic commented 5 years ago

@sheredom it is not trivial to set up a reliable test for this, and the failure mode would be a segfault or similar.

however, if you're running valgrind or something like that on CI in the tests it is easy to make valgrind fail with the previous function, so let me know if this is the case.

sheredom commented 5 years ago

Hm interesting! Could you just post a code snippet on this pull request showing the failure then? That way when I get round to adding clang sanitizers or valgrind I should have a test case ready to use 😄

bitonic commented 5 years ago

@sheredom (not tested, but should convey the idea):

  unsigned char *chrs1 = (unsigned char *) malloc(1);
  chrs1[0] = 'a';
  unsigned char *chrs2 = (unsigned char *) malloc(1);
  chrs2[0] = 'a';
  utf8ncmp((void *) chrs1, (void *) chrs2, 1);

before this PR, utf8ncmp will try to read beyond the first char, and in the case above that'd be unsafe.

sheredom commented 5 years ago

@bitonic thanks for the fix - and for the example! Always happy to have new contributors 😄

bitonic commented 5 years ago

thanks for the very useful code! I'm also thinking of changing the if branches in many of the functions to go through the single byte case first since it should be a lot more common -- right now it goes through the 4-byte version first. I first wanted to check if you had some benchmarks first though 🙃.

On Sun, 20 Jan 2019 at 10:24, Neil Henning notifications@github.com wrote:

@bitonic https://github.com/bitonic thanks for the fix - and for the example! Always happy to have new contributors 😄

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/sheredom/utf8.h/pull/57#issuecomment-455850289, or mute the thread https://github.com/notifications/unsubscribe-auth/AAh8OoiA3poXHww8y7i64c4ZrtO6bGNlks5vFDW3gaJpZM4aJifQ .

sheredom commented 5 years ago

I do not have any benchmarks for that - but if you find it is faster to switch the branches around feel free to submit that PR!