wallclockbuilder / stringutil

Ruby's string manipulation convenience functions for Golang
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Fix unicode.SimpleFold(r) #30

Open wallclockbuilder opened 9 years ago

wallclockbuilder commented 9 years ago

We expect that Right now unicode.SimpleFold('k') == '\u212A' *\u212A is 'K' the Kelvin char This is not intuitive for ASCII simple folding. Fix it so that unicode.SimpleFold('k') == 'K'

wallclockbuilder commented 9 years ago

lowercase letter 'k' is \u004B uppercase lettter 'K' is \u006B

wallclockbuilder commented 9 years ago

Simply mapping [a-z] to [A-Z] should work for most simple ASCII-only text documents.

wallclockbuilder commented 9 years ago

the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):

Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 OHM SIGN, U+212A KELVIN SIGN, and U+212B ANGSTROM SIGN. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.

In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B.

wallclockbuilder commented 9 years ago

Three letterlike symbols have been given canonical equivalence to regular letters: U+2126 ohm sign, U+212A kelvin sign, and U+212B angstrom sign.

In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex#15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.

http://www.unicode.org/versions/Unicode6.0.0/ch15.pdf

wallclockbuilder commented 9 years ago

Unicode 8.0 Character Code Charts The most current code chart containing U+212A is:

http://www.unicode.org/charts/PDF/U2100.pdf 

And it specs that the Kelvin sign is equivalent to the Latin Capital letter k. Heres a snapshot. kelvin sign