radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
177 stars 71 forks source link

adds non orthogonal line drawing #6

Closed m-abboud closed 8 years ago

m-abboud commented 8 years ago

See attached zip for sample pdf with a bunch of diagonal lines and the html output this pull request gives. pdf-lines-sample.zip

radkovo commented 8 years ago

Many thanks for your pull requests. They seem both great for the first look. I review them more thoroughly and I will merge asap (within a few days).

m-abboud commented 8 years ago

No problem! And thank you for making this awesome library! Only decent open source PDF -> HTML converter I could find for a side web project I was doing. (but have since abandoned my project that was using this and having a little more fun just working on this library itself lol)

m-abboud commented 8 years ago

added commit for CSSBoxTree support too

radkovo commented 8 years ago

All merged, thank you very much!

radkovo commented 8 years ago

BTW, I come across a minor problem when drawing vertical lines (getAngleDegrees() returned NaN) so I added some optimizations for drawing horizontal and vertical lines with no rotations.

m-abboud commented 8 years ago

Oops sorry about that! Thought I had added horz/vert lines to my test doc, was having trouble making perfectly straight lines in my pdf editor though..

radkovo commented 8 years ago

No problem, btw thanks for the tests you provided. I have made some changes in the test PDF loading (using resources) and configured the project for TravisCI so now the tests are run automatically.

m-abboud commented 8 years ago

Sweet! And that's bizarre how that one test is giving different color results on different platforms, I have no idea why either (fillRenderingModeText_outputIsFilledWithNoOutline is the test)

radkovo commented 8 years ago

I didn't have time for more detailed debugging but my guess is: is it possible that the color is specified using a different color model in the testing PDF and some rounding-related issues occur during the conversion to RGB?