Extracted fonts is not outputting any fonts.

Cardinal156 commented 3 weeks ago

Problem

When I try to extract font files out of a pdf no fonts are extracted, I've tried multiple different files with different fonts but none of them have worked, I have provided a sample pdf which doesn't work. If there are any more details that I can provide, please let me know.

Details

I am using the v0.9.1 x86-64 release.
OS: Windows (64 bits)
Command: ./pdfcpu.exe extract -mode font "C:/Users/User/Downloads/A.pdf" "C:/Users/User/Downloads/fonts"

Output:


extracting fonts from C:/Users/User/Downloads/A.pdf into C:/Users/User/Downloads/fonts/ ...
validating URIs..

optimizing...



## Sample file
[A.pdf](https://github.com/user-attachments/files/17623435/A.pdf)

Cardinal156 commented 3 weeks ago

When I run ./pdfcpu.exe validate -v "C:/Users/School/Downloads/A.pdf" to get font info I get this output:

Total pages: 1

Fonts for page 1:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3

Fontobjects:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3

Fonts:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2

Duplicate Fonts:

No image info available.

Cardinal156 commented 3 weeks ago

When I added the -v flag to the font extract (./pdfcpu.exe extract -mode font -v "C:/Users/User/Downloads/A.pdf" "C:/Users/User/Downloads/fonts"), I got some more information that might be related to the problem. Can it not make font files embedded in the pdf? Do I have to already have it installed on my computer?

extracting fonts from C:/Users/User/Downloads/A.pdf into C:/Users/User/Downloads/fonts/ ...
 INFO: 2024/11/04 17:24:02 PDF Version 1.5 conforming reader
 INFO: 2024/11/04 17:24:02 validating
validating URIs..

optimizing...
 INFO: 2024/11/04 17:24:02 optimizing fonts & images
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#6 - no font file available for font: FlatBrush
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#8 - no font file available for font: TimesNewRoman,Bold
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#10 - no font file available for font: TimesNewRoman,Italic
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#12 - no font file available for font: TimesNewRoman

hhrutter commented 3 weeks ago

Thanks for reporting this!

Cardinal156 commented 3 weeks ago

If there's any more info I can provide or answers I can question, please let me know

hhrutter commented 2 weeks ago

Your file does not have any embedded fonts, that's why.

The latest commit fixes a couple of things for font extraction.

Embedded fonts only that are also registered as TrueType fonts in the cross reference table will be extracted.

Extracted fonts may also be font subsets and contain down to as little as 1 glyph.

pdfcpu / pdfcpu

Extracted fonts is not outputting any fonts. #988

Problem

Details