sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.71k stars 122 forks source link

Invalid pointer returned when calling utf8codepoint function for a empty string #71

Closed ForrestFeng closed 4 years ago

ForrestFeng commented 4 years ago

Sample code to reproduce the issue

const char * emptystr = u8"";
void * ret = utf8codepoint( (void*)emptystr,   &c); 

It is expected to return (void ) emptystr, but returns (void ) (emptystr+1). The ret is now a bad index. It points to the address after the null terminator!

Suggest to add a null check at the beginning of the function, see below.


void *utf8codepoint(const void *utf8_restrict str,  utf8_int32_t *utf8_restrict out_codepoint) {
  const char *s = (const char *)str;

  // make sure a null string will alwaus return a fixed result, the pointer to str itself
  // without the check it could return an invalid posintion(s+x) which can result memory issue
  if ('\0' == *s) {
    return (void *)s;
  }

...

  return (void *)s;
}
sheredom commented 4 years ago

Thanks for this! I'll hopefully get to it soon.

sheredom commented 4 years ago

You know what - I've thought about this more, and I don't care that utf8codepoint returns an invalid pointer. You should check that the out_codepoint is not 0 (nul terminator) before accessing the return from utf8codepoint instead.