the-moisrex / webpp

C++ web framework | web development can be done with C++ as well.
https://t.me/webpp
MIT License
134 stars 9 forks source link

Unicode: toNFD does not pass NormalizationTests #556

Closed the-moisrex closed 1 month ago

the-moisrex commented 2 months ago

toNFD don't work, either decompose is the problem or the canonical_reorder algorithm.

I've added a disable option for utf-8 composition tests since they don't work yet but I need the UTF-32 versions to work perfectly before I start dealing with that mess.

the-moisrex commented 2 months ago

Seems like the decomposition gets the correct answers for \xFFC4 and \x1F133 that I initially though it would be wrong!

the-moisrex commented 2 months ago

A canonical mapping may also consist of a pair of characters, but is never longer than two characters. When a canonical mapping consists of a pair of characters, the first character may itself be a character with a decomposition mapping, but the second character never has a decomposition mapping.

from UTS #44

This is the problem with the algorithm now.

the-moisrex commented 1 month ago

It works now.