pdfcpu / pdfcpu

A PDF processor written in Go.
http://pdfcpu.io/
Apache License 2.0
7.05k stars 481 forks source link

Extracted fonts is not outputting any fonts. #988

Closed Cardinal156 closed 2 weeks ago

Cardinal156 commented 3 weeks ago

Problem

When I try to extract font files out of a pdf no fonts are extracted, I've tried multiple different files with different fonts but none of them have worked, I have provided a sample pdf which doesn't work. If there are any more details that I can provide, please let me know.

Details

optimizing...



## Sample file
[A.pdf](https://github.com/user-attachments/files/17623435/A.pdf)
Cardinal156 commented 3 weeks ago

When I run ./pdfcpu.exe validate -v "C:/Users/School/Downloads/A.pdf" to get font info I get this output:

Total pages: 1

Fonts for page 1:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3

Fontobjects:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3

Fonts:
obj     prefix     Fontname                       Subtype    Encoding             Embedded ResourceIds
#6                 FlatBrush                      TrueType   WinAnsiEncoding      true     F0
#12                TimesNewRoman                  TrueType   WinAnsiEncoding      true     F3
#8                 TimesNewRoman,Bold             TrueType   WinAnsiEncoding      true     F1
#10                TimesNewRoman,Italic           TrueType   WinAnsiEncoding      true     F2

Duplicate Fonts:

No image info available.
Cardinal156 commented 3 weeks ago

When I added the -v flag to the font extract (./pdfcpu.exe extract -mode font -v "C:/Users/User/Downloads/A.pdf" "C:/Users/User/Downloads/fonts"), I got some more information that might be related to the problem. Can it not make font files embedded in the pdf? Do I have to already have it installed on my computer?

extracting fonts from C:/Users/User/Downloads/A.pdf into C:/Users/User/Downloads/fonts/ ...
 INFO: 2024/11/04 17:24:02 PDF Version 1.5 conforming reader
 INFO: 2024/11/04 17:24:02 validating
validating URIs..

optimizing...
 INFO: 2024/11/04 17:24:02 optimizing fonts & images
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#6 - no font file available for font: FlatBrush
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#8 - no font file available for font: TimesNewRoman,Bold
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#10 - no font file available for font: TimesNewRoman,Italic
DEBUG: 2024/11/04 17:24:02 ExtractFont: ignoring obj#12 - no font file available for font: TimesNewRoman
hhrutter commented 3 weeks ago

Thanks for reporting this!

Cardinal156 commented 3 weeks ago

If there's any more info I can provide or answers I can question, please let me know

hhrutter commented 2 weeks ago

Your file does not have any embedded fonts, that's why.

The latest commit fixes a couple of things for font extraction.

Embedded fonts only that are also registered as TrueType fonts in the cross reference table will be extracted.

Extracted fonts may also be font subsets and contain down to as little as 1 glyph.