radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
175 stars 71 forks source link

Normalize TTF and Type0+TTF fonts #10

Closed m-abboud closed 8 years ago

m-abboud commented 8 years ago

As discussed in pull request #8, some TTF and Type0+TTF fonts coming straight from pdf's give font validation errors in some browsers.

This pull request changes FontTable to run them through FontVerter's font normalization. The relevant fonts in brno30.pdf and HorariosMadrid_Segovia.pdf now seem to render correctly and there are no longer any font validation errors with them in Chrome, IE and FireFox,.

radkovo commented 8 years ago

Hi, that's great! Thank you for both PRs! Hopefully, I have merged everything correctly.

m-abboud commented 8 years ago

Sweet, yeah did a merge on local of the two branches and noticed some conflicts, grabbed master just now everything looks good, thank you for merging them!