radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
175 stars 71 forks source link

adds paths with fills rendering support #14

Closed m-abboud closed 8 years ago

m-abboud commented 8 years ago

Adds filled path rendering, does this by rendering paths with a fill to an image element as html geometric elements can not supported path fills by themselves.

An example of a bunch of paths with fill is your HorariosMadrid_Segovia.pdf document. Note there are still some oddities with the new paths that get drawn with this change as we do not yet support curved path segments still only straight line and move to operations (and curved not supported for non filled paths as well).

radkovo commented 8 years ago

Well done, thanks! I have just two comments:

  1. The GfxAssert dependency is just used for testing? Then adding <scope>test</scope> would be fine.
  2. It seems that the tests fail because FontVerter depends on a SNAPSHOT release of pdfbox and the apache snapshot repository seems to be temporarily unavailable (see here). Does FontVerter use some extra features of the snapshot?

BTW do you plan to upload your packages such as FontVerter to Maven Central repository? I mean it would be nice to create a new pdf2dom release and using Central and avoiding snapshots would make the dependencies cleaner.

m-abboud commented 8 years ago
  1. Oops my bad, usually remember to add test scope.
  2. Didn't realize it had a snapshot dependency, fixed now.

And yeah I need to do that, I'll try to get them published to maven central this weekend.

Also travis-ci failed again looks like issue https://nexus.codehaus.org/snapshots/ for cssbox and jstyleparser dependency..

radkovo commented 8 years ago

I replaced all the dependencies with their stable versions and now it looks fine. I will probably make a project release soon. Thanks!