radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
179 stars 71 forks source link

Modifying HTML Styles #1

Closed kyle-wrenn closed 8 years ago

kyle-wrenn commented 10 years ago

I wouldn't say this is an issue, but more of a question. I'm struggling to figure out how to modify the inline styling that is provided with the html output. Is there a way to customize this?

radkovo commented 10 years ago

It depends on what exactly you want to customize. The style of the individual content elements (most of the generated div class="p" elements) is the result of the PDF transformation. I.e. the generated values correspond to the PDF source. The only part that could be customizable is the page rendering (the blue border). This part is also hardcoded in the Java source but it's an embedded style in the resulting document header.