sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.71k stars 122 forks source link

Add utf8ndup #42

Closed warmwaffles closed 6 years ago

warmwaffles commented 6 years ago

Some questions still remain for this.

I had a test like this

UTEST(utf8ndup, data) {
  void *const dup = utf8ndup(data, 20);
  ASSERT_TRUE(dup);
  ASSERT_EQ(20, utf8len(dup));
  free(dup);
}

The fun fact about this is that would be incorrect since it is copying 20 bytes.

Should this copy n bytes or n characters? I vote is for the later since in most cases when people using strndup assume they want n characters since bytes line up with characters in ascii.

Right now this is implemented as n bytes. I think I could change it to be n characters just a little more work is involved.

thoughts @sheredom?

closes #41

sheredom commented 6 years ago

So I think N should be bytes - mostly because the other utf8n* functions I've added have that:

// Append the utf8 string src onto the utf8 string dst, // writing at most n+1 bytes. Can produce an invalid utf8 // string if n falls partway through a utf8 codepoint.

I think this was the requested behaviour when I started the library (can't remember from whom) so we should stick with it for the sake of backwards compatibility!

warmwaffles commented 6 years ago

@sheredom are there any changes you would like to see be made? I could just as easily add some more tests as well since that is probably desirable.