maxpmaxp / pdfreader

Python API for PDF documents
MIT License
113 stars 26 forks source link

Can't extract fonts, FontDescriptor.FontFile is None #111

Closed jenskutilek closed 4 months ago

jenskutilek commented 9 months ago

I am able to extract the fonts from the sample PDF in the tutorial, but not from my own PDF.

print(font.FontDescriptor.FontFile.filtered)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'filtered'

Any idea what the reason is, or how to successfully extract the font?

hello.pdf

maxpmaxp commented 4 months ago

@jenskutilek it's not an issue. The attached file doesn't contain any font files inside. It just describes which font to use.

>>> from pdfreader import PDFDocument
>>> fd = open("hello.pdf","rb")
>>> doc = PDFDocument(fd)
>>> page = next(doc.pages())
>>> sorted(page.Resources.Font.keys())
['Tc1']
>>> page.Resources
{'ProcSet': ['PDF', 'Text'], 'ColorSpace': {'Cs1': <IndirectReference:n=5,g=0>, 'Cs2': <IndirectReference:n=6,g=0>}, 'Font': {'Tc1': <IndirectReference:n=7,g=0>}}
>>> font = page.Resources.Font['Tc1']
>>> font.Subtype, font.BaseFont, font.Encoding
('Type1', 'AAAAAB+Produkt-Regular', 'MacRomanEncoding')
>>> font.FontDescriptor.FontFile is None
True