pdf-raku / PDF-Font-Loader-raku

Font loader for the PDF tool-chain
Artistic License 2.0
1 stars 3 forks source link

/Identity-H + /ToUnicode decoding of ligatures #19

Closed dwarring closed 2 years ago

dwarring commented 2 years ago

The attached golfed PDF has an Identity-H encoded subset with a /ToUnicode mapping that includes the entry <0193> <00660069>, mapping ligature CID 0x0193 to 'fi`. This is not currently respected, as in:

use Test;
plan 1;
use PDF::Lite;
use PDF::Font::Loader;
use PDF::Font::Loader::FontObj;

my PDF::Lite $pdf .= open: "identity-h-lig.pdf";

my $dict = $pdf.page(1)<Resources><Font><C2_0>;

my PDF::Font::Loader::FontObj:D $font = PDF::Font::Loader.load-font: :$dict;

my $bytes = buf8.new(0x00,0x32, 0x00,0x49, 0x01,0x93, 0x00,0x46, 0x00,0x48).decode: "latin-1";
is $dict.decode($bytes), 'Office';

which produces:

1..1
not ok 1 - 
# Failed test at /tmp/identity-h-lig.t line 14
# expected: 'Office'
#      got: 'Offce'
# You failed 1 test of 1

identity-h-lig.pdf