Open hattesen opened 1 year ago
The shell script (Linux/MacOS/Cygwin) below will extract the mappings from a Unicode Mapping File that are NOT one-to-one.
$ cat 8859-15.TXT | egrep "^#\t\w+:\s|^[^#]" | egrep -v "^0x(..).0x00\1"
# Name: ISO/IEC 8859-15:1999 to Unicode
# Date: 1999 July 27 (header updated: 2015 December 02)
# Authors: Markus Kuhn <http://www.cl.cam.ac.uk/~mgk25/>
# Format: Three tab-separated columns
0xA4 0x20AC # EURO SIGN
0xA6 0x0160 # LATIN CAPITAL LETTER S WITH CARON
0xA8 0x0161 # LATIN SMALL LETTER S WITH CARON
0xB4 0x017D # LATIN CAPITAL LETTER Z WITH CARON
0xB8 0x017E # LATIN SMALL LETTER Z WITH CARON
0xBC 0x0152 # LATIN CAPITAL LIGATURE OE
0xBD 0x0153 # LATIN SMALL LIGATURE OE
0xBE 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
$ echo "This proves that 8859-1 is an identity mapping to Unicode"
This proves that 8859-1 is an identity mapping to Unicode
$ cat 8859-1.TXT | egrep "^#\t\w+:\s|^[^#]" | egrep -v "^0x(..).0x00\1"
# Name: ISO/IEC 8859-1:1998 to Unicode
# Date: 1999 July 27 (header updated: 2015 December 02)
# Authors: Ken Whistler <ken@unicode.org>
# Format: Three tab-separated columns
This is a feature request which is indispensible when working with international (Latin) languages.
The current implementation of FreeFontConverter converts the characters at Unicode code points
0x20
(Space) to0xFF
into bitmaps in a header file.ASCII
Many English language applications will not require/use glyphs outside the ASCII range (
0x20 ~ 0x7F
), so I propose adding a runtime argument specifying ASCII (replacingCHARMAP_LAST_CHAR
by0x7F
) thus halving the memory footprint.Example:
Character Encoding
When working with 8-bit character sets, a range of encodings exist that convert a 8 bit numeric character values into a (Unicode) glyph, supporting the requirements for a wide range of languages. The lower half (
0x00 ~ 07F
) is mapped to the standard ASCII character set, while the upper half (0x80 ~ 0xFF
) varies according to the encoding.Without support for such character encodings, the upper half of the character set (using unicode code points
0x80 ~ 0xFF
will be ISO/IEC 8859-1 (equality mapping), which excludes support for a lot of languages/translations (se below).I therefore propose adding a runtime argument for specifying an 8-bit character encoding, and mapping the 8-bit character code to a (126 bit) Unicode glyph before generating the bitmapped font.
Example:
The most commonly used (universal) character encoding for Latin languages is ISO/IEC 8859-15 (superceding ISO 8859-1), which should be used as the default value. A pseudo
ASCII
mapping could be generated by only using character codes0x20 ~ 0x7F
of the ISO/IEC 8859-15 mappings.I propose adding support for ISO/IEC 8859-15, and possibly the remainder of the ISO/IEC 8859 encodings.
Commonly Used 8-bit Character Encodings
Covering most Western European languages
Supports those Central and Eastern European languages that use the Latin alphabet
Covers mostly Slavic languages that use a Cyrillic alphabet
Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones
A rearrangement of Latin-4. Considered more useful for Nordic languages
8859-12 Latin/DevanagariThe work in making a part of 8859 for Devanagari was officially abandoned in 1997Added some characters for Baltic languages which were missing from Latin-4 and Latin-6
A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ