radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
GNU Lesser General Public License v3.0
175 stars 71 forks source link

document:null error #37

Open zharenkov opened 5 years ago

zharenkov commented 5 years ago

Hi, I'm trying to run DOM creation, as in example on sourceforge project page, but in result my Document object doesn't have any DOM.

        PDDocument pdDocument = PDDocument.load(new FileInputStream(new File(ClassLoader.getSystemClassLoader().getResource("tst.pdf").getFile())));
        PDFDomTree tree = new PDFDomTree(PDFDomTreeConfig.createDefaultConfig());
        Document d = tree.createDOM(pdDocument);

The output is

[#document: null]

What i'm doing wrong?
