sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.73k stars 126 forks source link

Add support for latin case compares. #46

Closed sheredom closed 6 years ago

sheredom commented 6 years ago

This fixes issue #45.

giampaolo commented 6 years ago

I made some tests through a Python project I'm currently working on. Here is a list of chars which utf8.h can handle with this PR in place. The ones which are commented out are supported by Python (which lowercases them) but not by utf8.h. I took the list from here: http://heraultetplus.com/files/CharacterCodes.htm

chars = [
    "À",  # Capital A, accent grave
    "Á",  # Capital A, accent acute
    "Â",  # Capital A, accent circumflex
    "Ã",  # Capital A, accent tilde
    "Ä",  # Capital A, accent umlaut
    "Å",  # Capital A, accent ring
    "Æ",  # Capital AE, ligature
    "Ç",  # Capital C, cedilla
    # "Γ",  # Capital gamma, (Greek)
    # "Δ",  # Capital Delta, (Greek)
    "È",  # Capital E, accent grave
    "É",  # Capital E, accent acute
    "Ê",  # Capital E, accent circumflex
    "Ë",  # Capital E, accent umlaut
    # "Ð",  # Capital Eth, (Icelandic)
    # "Θ",  # Capital theta, (Greek)
    "Ì",  # Capital I, accent grave
    "Í",  # Capital I, accent acute
    "Î",  # Capital I, accent circumflex
    "Ï",  # Capital I, accent umlaut
    # "Λ",  # Capital Lambda, (Greek)
    "Ñ",  # Capital N, accent tilde
    # "Ξ",  # Capital Xi, (Greek)
    "Ò",  # Capital O, accent grave
    "Ó",  # Capital O, accent acute
    "Ô",  # Capital O, accent circumflex
    "Õ",  # Capital O, accent tilde
    "Ö",  # Capital O, accent umlaut
    "Ø",  # Capital O, accent slash
    "Œ",  # Capital OE, ligature
    # "Π",  # Capital Pi, (Greek)
    "Š",  # Capital Esh, (Latin)
    # "Σ",  # Capital Sigma, (Greek)
    "Þ",  # Capital THORN, (Icelandic)
    "Ù",  # Capital U, accent grave
    "Ú",  # Capital U, accent acute
    "Û",  # Capital U, accent circumflex
    "Ü",  # Capital U, accent umlaut
    # "Φ",  # Capital Phi, (Greek)
    "Ý",  # Capital Y, accent acute
    "Ÿ",  # Capital Y, accent umlaut
    # "Ψ",  # Capital Psi, (Greek)
    # "Ω",  # Capital Omega, (Greek)
    # "℧",  # Inverted Capital Omega
    "Ž",  # Capital Z, with caron
]
sheredom commented 6 years ago

Ok so it looks like for the most part I'm just missing the greek symbols - I'll do a follow-up PR to add them in bulk (I'd rather not just add a few here and there!).

giampaolo commented 6 years ago

Sounds good!