sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.74k stars 127 forks source link

Couple of thoughts #14

Closed mlabbe closed 8 years ago

mlabbe commented 8 years ago

Hey, nice library! I am looking for utf-8 C string parsing and this fits the bill. I had a couple of thoughts after reading the code.

For instance, a safer utf8ncpy function that guarantees a null terminator (possibly truncating last char) and returns boolean whether the string was truncated or not can be helpful, if certainly, not conformant with anything in string.h. Also, only filling one NULL character in at the end, because zeroing after termination is a waste of cycles. I have been using such a workhorse function for years.

Here is a snippet that lets you portably apply the restrict keyword if you're interested:

#if defined(__GNUC__) || defined(__clang__)
    #ifdef __cplusplus    
        #define utf8_restrict __restrict
    #else
        #if __STDC_VERSION__ >= 199901L
            #define utf8_restrict restrict
        #endif
    #endif
#elif defined(_MSC_VER) && (_MSC_VER >= 1400) /* vs2005 */
    #define utf8_restrict __restrict
#else
   #define utf8_restrict
#define
mlabbe commented 8 years ago

Here is my safer strncpy, verbatim paste from my core lib. If you want to adapt or use any part of this, I release it under the public domain.

The FTG_ATTRIBUTES() thing is a macro that generates a compiler warning if the truncation bit is ignored. Since it is a potential security risk if a string is truncated, I force a check. This is appropriate for my code. (I wish I could conditionally avoid this warning if *src was a literal).

/* Fill up to max_copy characters in dst, including null.  Unlike strncpy(), a null
   terminating character is guaranteed to be appended, EVEN if it overwrites
   the last character in the string.

   Only appends a single NULL character instead of NULL filling the string.  The 
   trailing bytes are left uninitialized.

   No bytes are written if max_copy is 0, and FTG_ASSERT is thrown.

   \return 1 on truncation or max_copy==0, zero otherwise.
                                                                                   */
FTG_ATTRIBUTES(FTG_EXT_warn_unused_result) int
ftg_strncpy(char *ftg_restrict dst, const char *ftg_restrict src, size_t max_copy)
{
    size_t n;
    char *d;

    FTG_ASSERT(dst);
    FTG_ASSERT(src);
    FTG_ASSERT(max_copy > 0);

    if (max_copy == 0)
        return 1;

    n = max_copy;
    d = dst;
    while ( n > 0 && *src != '\0' )    
    {
        *d++ = *src++;
        --n;
    }

    /* Truncation case -
       terminate string and return true */
    if ( n == 0 )
    {
        dst[max_copy-1] = '\0';
        return 1;
    }

    /* No truncation.  Append a single NULL and return. */
    *d = '\0';
    return 0;
}
sheredom commented 8 years ago

I'll take your comments on restrict on board for sure - I'm not certain I like the assert myself personally (but I'm willing to spend some brain cycles on the thought before I commit one way or the other!)

Thanks for the input though - all comments are good comments :smile:

sheredom commented 8 years ago

Just for clarity - I've decided against adding the assert, but I have done the restrict change. Thanks again for spending time reviewing my lib!