sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.71k stars 122 forks source link

[feature] Add utf8ndup #41

Closed warmwaffles closed 6 years ago

warmwaffles commented 6 years ago

This is my current implementation. I am using it to replace all of my strndups

#include <utf8.h>

void*
utf8ndup(const void* src, size_t n)
{
    const char* s = (const char*)src;
    char* c       = 0;

    // figure out how many bytes (including the terminator) we need to copy first
    size_t bytes = utf8size(src);

    c = (char*)malloc(n);

    if (0 == c) {
        // out of memory so we bail
        return 0;
    }

    bytes = 0;
    size_t i = 0;

    // copy src byte-by-byte into our new utf8 string
    while ('\0' != s[bytes] && i < n) {
        c[bytes] = s[bytes];
        bytes++;
        i++;
    }

    // append null terminating byte
    c[bytes] = '\0';
    return c;
}

I don't know if this is desirable. I am almost just half tempted to calloc an memcpy the results.

f2404 commented 6 years ago

What's the point in

size_t bytes = utf8size(src);
bytes = 0;

?

warmwaffles commented 6 years ago

I think originally I intended to check to see if the new string will be smaller than the requested size.

But this is literally the utf8dup code with a tacked on size_t n

f2404 commented 6 years ago

Also, you don't need two iterators (bytes and i). One would be enough.

warmwaffles commented 6 years ago
void*
utf8ndup(const void* src, size_t n)
{
    const char* s = (const char*)src;
    char* c       = 0;

    // figure out how many bytes (including the terminator) we need to copy first
    size_t bytes = utf8size(src);

    if (n < bytes) {
        c = (char*)malloc(n + 1);
    } else {
        c = (char*)malloc(bytes);
        n = bytes;
    }

    if (!c) {
        // out of memory so we bail
        return 0;
    }

    bytes = 0;

    // copy src byte-by-byte into our new utf8 string
    while ('\0' != s[bytes] && bytes < n) {
        c[bytes] = s[bytes];
        bytes++;
    }

    // append null terminating byte
    c[bytes] = '\0';
    return c;
}
warmwaffles commented 6 years ago

Anyways, this could probably be better and probably share the code used in utf8dup if the string is shorter than the requested n

sheredom commented 6 years ago

Thanks for looking at this!

Two options:

I'm happy to do the work, but some people would rather there name was on the commit if they did the work!

warmwaffles commented 6 years ago

@sheredom I would be more than happy to submit a PR for this. Just wanted to test the waters here first.