unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.5k stars 249 forks source link

internal IdentityEncoder should be more clear with rune handling #360

Open gunnsth opened 4 years ago

gunnsth commented 4 years ago

The IdentityEncoder is used to represent Identity-H and Identity-V encodings, that are used to map 2-byte character codes to 2-byte CIDs:

The horizontal identity mapping for 2-byte CIDs; may be used with CIDFonts
using any Registry, Ordering, and Supplement values. It maps 2-byte character
codes ranging from 0 to 65,535 to the same 2-byte CID value, interpreted highorder
byte first.

When used with TrueType CID fonts, the CID values typically map directly to GID (glyph indices), where the CID value does not have any unicode meaning. Thus it can be confusing that it implements the TextEncoder interface, having methods such as CharcodeToRune where it is returning a "rune" that is not actually the utf-8 rune but just the integer value of the CID... This is confusing and can easily lead to problems.

We probably need to clarify the terminology and maybe split the TextEncoder interface up. The Identity-H should just map bytes to CIDs and such. If a CIDToGIDMap is defined that also needs to be used.