Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
when I convert pdf document that below to html document
广德经济开发区以数字化转型推动 PCB 产业转型升级的若干政策.pdf,I found that text context order of converted html document is inconsistent with the context order of the original pdf document。look at the picture below,
when I convert pdf document that below to html document 广德经济开发区以数字化转型推动 PCB 产业转型升级的若干政策.pdf,I found that text context order of converted html document is inconsistent with the context order of the original pdf document。look at the picture below,