radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
179 stars 71 forks source link

Add module-info.java #42

Open KleaTech opened 4 years ago

KleaTech commented 4 years ago

So that it will be a dedicated module instead of an automatic one.

aino-gautam commented 4 years ago

@KleaTech what module system are you referring to ? Jar packaging or something else like OSGI ?

KleaTech commented 4 years ago

I'm of course referring to the Java module system

KleaTech commented 4 years ago

However I tried to do it myself and it is not possible as long as you use Xerces. Neko HTML also had to be replaced, but it was relatively easy. I don't remember what I used instead.

radkovo commented 4 years ago

Pdf2Dom still targets to Java 8 so the modularization of the library could cause problems. However, I added the Automatic-Module-Name: net.sf.cssbox.pdf2dom declaration to the manifest so that the automatic module name should be stable now. Does it solve your problem?

I have also moved the CSSBox bindings to a separate projects so now the dependencies so now it should not depend on xerces and NekoHTML anymore. (see the recent commits in master)

KleaTech commented 4 years ago

The stable module name is nice to have. It's best to add it to every project that targets java 8 or earlier. However for what I tried to do, it's not enough. I tried to pack my own JRE with jlink, but to use it all dependencies in the dependency hierarchy must be a Java module, no automatic module can be used anywhere. As I said earlier this can be done except for Xerces. Removing to CSSBox dependency may solve this problem, I will check it out.