pdf-raku / PDF-Font-Loader-raku

Font loader for the PDF tool-chain
Artistic License 2.0
1 stars 3 forks source link

Font failing to decode #34

Closed dwarring closed 12 months ago

dwarring commented 12 months ago

The attached PDF is failing to decode an Identity-H encoded font with its supplied ToUnicode CMAP.

This is apparent when processed via pdf-tag-dump.raku. This script also explores this a little further:

use PDF::Font::Loader;
use PDF::Font::Loader::FontObj;

use PDF::COS::Dict;
use PDF::Lite;
my PDF::Lite $pdf .= open: "/tmp/SSRN-id4337484.pdf";

my  PDF::COS::Dict:D $dict = $pdf.page(1)<Resources><Font><F9>;

my PDF::Font::Loader::FontObj:D $font = PDF::Font::Loader.load-font: :$dict;

my $str = "\x[3]~\0\x[4]\x[1]\x[F]\x[1]µ\x[1]l\x[1]u\x[1]\x[1E]\x[1]]\x[1]o\x[3]U\0\x[3]\x[1]\x[1E]\x[1]\x[9A]\0\x[3]\x[1]\x[2]\x[1]o\x[3]X\x[3]U\0\x[3]\x[3]î\x[3]ì\x[3]î\x[3]í\x[3]V\0";

say  $str.comb(/../).map({$font.decode($_, :str)}).join;

Produces: (bukmeiletal2021, whereas the rendered text is (Abukmeil, et al., 2021

SSRN-id4337484.pdf