rochaferraz / FreeFontConverter

FreeFontConverter generates typography-quality bitmap fonts optimized for embedded systems
Apache License 2.0
2 stars 0 forks source link

Feature Request: Specifying 8-bit Character Encoding (ASCII or ISO 8859-x) used when generating glyphs #2

Open hattesen opened 1 year ago

hattesen commented 1 year ago

This is a feature request which is indispensible when working with international (Latin) languages.

The current implementation of FreeFontConverter converts the characters at Unicode code points 0x20 (Space) to 0xFF into bitmaps in a header file.

ASCII

Many English language applications will not require/use glyphs outside the ASCII range (0x20 ~ 0x7F), so I propose adding a runtime argument specifying ASCII (replacing CHARMAP_LAST_CHAR by 0x7F) thus halving the memory footprint.

Example:

$ freeFontConverter --font=OpenSansRegular.ttf --encoding=ASCII

Character Encoding

When working with 8-bit character sets, a range of encodings exist that convert a 8 bit numeric character values into a (Unicode) glyph, supporting the requirements for a wide range of languages. The lower half (0x00 ~ 07F) is mapped to the standard ASCII character set, while the upper half (0x80 ~ 0xFF) varies according to the encoding.

Without support for such character encodings, the upper half of the character set (using unicode code points 0x80 ~ 0xFF will be ISO/IEC 8859-1 (equality mapping), which excludes support for a lot of languages/translations (se below).

I therefore propose adding a runtime argument for specifying an 8-bit character encoding, and mapping the 8-bit character code to a (126 bit) Unicode glyph before generating the bitmapped font.

Example:

$ freeFontConverter --font=OpenSansRegular.ttf --encoding=8859-15

The most commonly used (universal) character encoding for Latin languages is ISO/IEC 8859-15 (superceding ISO 8859-1), which should be used as the default value. A pseudo ASCII mapping could be generated by only using character codes 0x20 ~ 0x7F of the ISO/IEC 8859-15 mappings.

I propose adding support for ISO/IEC 8859-15, and possibly the remainder of the ISO/IEC 8859 encodings.

Commonly Used 8-bit Character Encodings

hattesen commented 1 year ago

Tip

The shell script (Linux/MacOS/Cygwin) below will extract the mappings from a Unicode Mapping File that are NOT one-to-one.

$ cat 8859-15.TXT | egrep "^#\t\w+:\s|^[^#]" | egrep -v "^0x(..).0x00\1"
#   Name:             ISO/IEC 8859-15:1999 to Unicode
#   Date:             1999 July 27 (header updated: 2015 December 02)
#   Authors:          Markus Kuhn <http://www.cl.cam.ac.uk/~mgk25/>
#   Format:  Three tab-separated columns
0xA4    0x20AC  #   EURO SIGN
0xA6    0x0160  #   LATIN CAPITAL LETTER S WITH CARON
0xA8    0x0161  #   LATIN SMALL LETTER S WITH CARON
0xB4    0x017D  #   LATIN CAPITAL LETTER Z WITH CARON
0xB8    0x017E  #   LATIN SMALL LETTER Z WITH CARON
0xBC    0x0152  #   LATIN CAPITAL LIGATURE OE
0xBD    0x0153  #   LATIN SMALL LIGATURE OE
0xBE    0x0178  #   LATIN CAPITAL LETTER Y WITH DIAERESIS

$ echo "This proves that 8859-1 is an identity mapping to Unicode"     
This proves that 8859-1 is an identity mapping to Unicode

$ cat 8859-1.TXT | egrep "^#\t\w+:\s|^[^#]" | egrep -v "^0x(..).0x00\1"
#   Name:             ISO/IEC 8859-1:1998 to Unicode
#   Date:             1999 July 27 (header updated: 2015 December 02)
#   Authors:          Ken Whistler <ken@unicode.org>
#   Format:  Three tab-separated columns