Open wallclockbuilder opened 9 years ago
lowercase letter 'k' is \u004B uppercase lettter 'K' is \u006B
Simply mapping [a-z] to [A-Z] should work for most simple ASCII-only text documents.
the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):
Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 OHM SIGN, U+212A KELVIN SIGN, and U+212B ANGSTROM SIGN. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.
In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B.
Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 ohm sign, U+212A kelvin sign, and U+212B angstrom sign.
In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex#15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.
Unicode 8.0 Character Code Charts The most current code chart containing U+212A is:
http://www.unicode.org/charts/PDF/U2100.pdf
And it specs that the Kelvin sign is equivalent to the Latin Capital letter k. Heres a snapshot.
We expect that Right now unicode.SimpleFold('k') == '\u212A' *\u212A is 'K' the Kelvin char This is not intuitive for ASCII simple folding. Fix it so that unicode.SimpleFold('k') == 'K'