libui-ng / libui-ng

libui-ng: a portable GUI library for C. "libui for the next generation"
https://libui-ng.github.io/libui-ng/
MIT License
581 stars 51 forks source link

Optimize UTF converter #276

Open matyalatte opened 2 months ago

matyalatte commented 2 months ago

toUTF16 and toUTF8 have redundant codes.

They call a converter to calculate a buffer size, and call it again to do the actual conversion. They can be faster with dedicated functions for size calculation. Also, they use UTF32 converters internally. I mean, toUTF8 has two steps: "UTF16 to UTF32" and "UTF32 to UTF8." It can remove many conditional branches (especially for ASCII characters) with a true "UTF16 to UTF8" function.

For example, uiprivUTF16UTF8Count can be simplified like this.

size_t uiprivUTF16UTF8CountFaster(const uint16_t *s)
{
    size_t len;
    uint16_t rune;

    len = 0;
    while (*s) {
        rune = *s;
        s++;
        if (rune < 0x80) {  // ASCII bytes represent themselves
            len += 1;
        } else if (rune < 0x800) {  // two-byte encoding
            len += 2;
        } else if (rune < 0xD800 || rune >= 0xE000) {
            // three-byte encoding
            len += 3;
        } else if (rune >= 0xDC00 || *s < 0xDC00 || *s >= 0xE000) {
            // bad rune (out-of-order surrogates or bad surrogate pair)
            len += 3;
        } else {
            // four-byte encoding
            s++;  // four bytes even for UTF16.
            len += 4;
        }
    }
    return len;
}

It is a low priority because the current implementation is fast enough for me, but someone should work on it in the future.

szanni commented 2 months ago

I am honestly more surprised that the code does not use the WinAPI MultiByteToWideChar and WideCharToMultiByte functions. Not sure if these would be faster or slower.

matyalatte commented 2 months ago

I am honestly more surprised that the code does not use the WinAPI MultiByteToWideChar and WideCharToMultiByte functions.

libui used them at alpha3.5 but andlabs replaced them with his own library (andlabs/utf) to support uiAttributedString.