pdf-raku / PDF-Font-Loader-raku

Font loader for the PDF tool-chain
Artistic License 2.0
1 stars 3 forks source link

Handle custom ligatures in string decoding #35

Closed dwarring closed 1 year ago

dwarring commented 1 year ago

For example, the following CMap has a custom 'Th' ligature <00540068> (as well as a standard 'fl' <00660069>). Should it decode as :str to 'Th'?

3208 0 obj
<< /Length 609 >> stream
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (AAAAAA+F22+0) /Ordering (T1UV) /Supplement 0 >> def
/CMapName /AAAAAA+F22+0 def
/CMapType 2 def
1 begincodespacerange <02> <90> endcodespacerange
3 beginbfchar
<20> <0020>
<3b> <003B>
<90> <2019>
endbfchar
9 beginbfrange
<28> <29> <0028>
<2c> <36> <002C>
<38> <39> <0038>
<41> <50> <0041>
<52> <54> <0052>
<56> <57> <0056>
<59> <5a> <0059>
<61> <7a> <0061>
<8d> <8e> <201C>
endbfrange
2 beginbfrange
<02> <02> [<00540068>]
<03> <03> [<00660069>]
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end

endstream
endobj
dwarring commented 1 year ago

Formally it just skipped decoding that ligature. It now returns the characters, Th in this case.