radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
175 stars 71 forks source link

Failed to decode downloaded font in Chrome #49

Open yoolandes opened 3 years ago

yoolandes commented 3 years ago

Chrome/Firefox seems to have problems to encode some embedded fonts when opening the ouput file although Pdf2Dom does not log any error/information while processing the input file.

Expected Behavior

Chrome/Firefox are able to encode the embedded font files.

Current Behavior

Chrome/Firefox cant encode some font files. The logs state:

Chrome

Failed to decode downloaded font: data:application/x-font-woff;base64,###base64### Minted_FontList.html:1 OTS parsing error: CFF : Failed to parse table

Firefox

downloadable font: CFF : Failed to parse table (font-family: "NYDYER Theodore" style:normal weight:400 stretch:100 src index:0) source: data:application/x-font-woff;base64,###base64###

Steps to Reproduce

  1. Download https://cdn3.minted.com/files/content/community/Minted_FontList.pdf
  2. Run "java -jar .\PDFToHTML.jar .\Minted_FontList.pdf"
  3. Open in latest Chrome/Firefox.
m-abboud commented 3 years ago

This is a problem with https://github.com/m-abboud/FontVerter font conversion not Pdf2Dom. But I don't really have time to work on open source anymore.