pdf-raku / PDF-Font-Loader-raku

Font loader for the PDF tool-chain
Artistic License 2.0
1 stars 3 forks source link

Glyph maps can't handle type1 charsets #8

Closed dwarring closed 3 years ago

dwarring commented 3 years ago

Here's an example of a font that can't currently be represented. From 000377-001.pdf (attached):

19 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /BaseFont /GMOICK+MSTT31c531S00
  /Encoding 20 0 R
  /FirstChar 1
  /FontDescriptor 21 0 R
  /LastChar 61
  /Widths [ 1004 836 391 334 558 613 502 334 772 606 440 331 440 248 248 772 496 552 827 552 606 248 386 716 606 827 552 496 496 1004 613 613 662 496 496 552 496 613 893 388 659 769 247 769 988 604 714 823 385 604 659 659 823 604 604 823 494 659 329 329 823 ]
>>
endobj

20 0 obj
<<
  /Type /Encoding
  /Differences [ 1 /g179 /g50 /g85 /g76 /g74 /g81 /g68 /g79 /g36 /g69 /g86 /g87 /g70 /g17 /g3 /g46 /g72 /g92 /g90 /g82 /g71 /g29 /g45 /g53 /g83 /g39 /g89 /g23 /g26 /g48 /g75 /g88 /g55 /g20 /g21 /g41 /g22 /g78 /g80 /g73 /g61 /g56 /g15 /g57 /g58 /g54 /g38 /g43 /g44 /g40 /g60 /g37 /g42 /g51 /g47 /g49 /g93 /g59 /g77 /g16 /g52 ]
>>
endobj

21 0 obj
<<
  /Type /FontDescriptor
  /Ascent 0
  /CapHeight 0
  /CharSet (/g38/g179/g23/g37/g36/g52/g58/g72/g50/g20/g15/g69/g21/g73/g56/g85/g86/g16/g88/g61/g76/g87/g82/g60/g74/g43/g70/g78/g80/g81/g47/g89/g17/g57/g83/g68/g41/g3/g75/g79/g51/g44/g59/g26/g92/g93/g55/g54/g49/g46/g45/g71/g29/g53/g40/g90/g42/g77/g39/g22/g48)
  /Descent 0
  /Flags 4
  /FontBBox [ -16 -265 1004 727 ]
  /FontFile3 22 0 R
  /FontName /GMOICK+MSTT31c531S00
  /ItalicAngle 0
  /StemV 0
>>
endobj

It's completely making up it's own encoding with custom glyphs and no unicode map. We couldn't do much with it, rather than render it, but it does show our representations are correct. Also we should be taking account of /CharSet to setup custom encoding -> cid mappings.

dwarring commented 3 years ago

Actually, someone could make use of this font, if they're willing to got to the trouble of setting up a /ToUnicode map for the font.

Font should then operate normally for encoding/extracting text etc.

dwarring commented 3 years ago

FWIW, CID fonts also have a similar issue. They're not currently respecting the optional /CIDToGIDMap stream.

dwarring commented 3 years ago

Fixed in 0.5.3, release, including the handling /CIDToGIDMap streams.