radkovo / Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.
http://cssbox.sourceforge.net/pdf2dom/
GNU Lesser General Public License v3.0
175 stars 71 forks source link

Getting error while creating dom object from pdf in liferay using Pdf2Dom #24

Open siddharthnaik7 opened 6 years ago

siddharthnaik7 commented 6 years ago

Hello @radkovo, @m-abboud ,

I want to convert pdf to dom object. I am trying to use following code:

File file = new File("E:/IETM/ROV.pdf"); PDDocument pdf = PDDocument.load(file); PDFDomTree parser = new PDFDomTree(); Document dom = parser.createDOM(pdf); return dom.toString();

But I am getting following error: 11:49:01,544 ERROR [http-nio-8080-exec-10][JSONWebServiceServiceAction:97] org/fit/pdfdom/PDFToHTML

Kindly let me know where I am going wrong and what should be done to resolve this.

Thanks, Siddharth

radkovo commented 6 years ago

The error you mention does not seem to be related to Pdf2Dom directly. It refers to some JSON operation; Pdf2Dom does not use JSON anywhere. I would recommend debugging your app first.

However, the return dom.toString() command seems strange; it won't probably do what you expect. If you want to serialize DOM to HTML, try to use pdf.writeText(dom, some_writer). An example can be found here.

siddharthnaik7 commented 6 years ago

I followed the example, the code is working fine in Java Perspective , but it gives me the same error 11:36:26,295 ERROR [http-nio-8080-exec-2][JSONWebServiceServiceAction:97] org/fit/pdfdom/PDFDomTree in Liferay Perspective.

Yes the error is not related to Pdf2Dom directly. In Liferay Perspective, via JsonWS, while calling the service in which this code is written, I am getting this error.

m-abboud commented 6 years ago

Can you post the full stack trace not just that one error log line?

@radkovo kinda looks like that log statement is actually saying pdf2dom is throwing an error but he doesn't have the log statements setup to show the full stack trace or something.

If you're catching the exception either call ex.printStackTrace() or use the apache.commons.lang3 library's ExceptionUtils.getStackTrace(ex) method to get a string that you can pass to log.error()

siddharthnaik7 commented 6 years ago

Hello @radkovo , @m-abboud

The issue was related to Library path. I was accessing the jar file from external source and not from the portlet library. I got it resolved.

But now, I am getting this Error: cannot initialize the DOM serializer

I am trying to use following code:

File file = new File("E:/IETM/PDFS/ROV-Daksh-Technical-Manual-Part-I-Vol-I.pdf");
Writer output = null;
try {
        PDFDomTree prsr = new PDFDomTree();
        PDDocument pdf = PDDocument.load(file);
            output = new PrintWriter("E:/IETM/pdf.html", "utf-8");
        prsr.writeText(pdf, output);
} catch (Exception e) {
        e.printStackTrace();
}
output.close();

Please find below my stack trace:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
java.io.IOException: Error: cannot initialize the DOM serializer
    at org.fit.pdfdom.PDFDomTree.writeText(PDFDomTree.java:197)
    at com.ietm.service.impl.PdfServiceImpl.pdfManager(PdfServiceImpl.java:102)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:153)
    at com.liferay.portal.spring.transaction.DefaultTransactionExecutor.execute(DefaultTransactionExecutor.java:85)
    at com.liferay.portal.spring.transaction.TransactionInterceptor.invoke(TransactionInterceptor.java:58)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.RetryAdvice.invoke(RetryAdvice.java:46)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.service.ServiceContextAdvice.invoke(ServiceContextAdvice.java:40)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ChainableMethodAdvice.invoke(ChainableMethodAdvice.java:56)
    at com.liferay.portal.spring.aop.ServiceBeanMethodInvocation.proceed(ServiceBeanMethodInvocation.java:127)
    at com.liferay.portal.spring.aop.ServiceBeanAopProxy.invoke(ServiceBeanAopProxy.java:173)
    at com.sun.proxy.$Proxy779.pdfManager(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.liferay.portal.jsonwebservice.JSONWebServiceActionImpl._invokeActionMethod(JSONWebServiceActionImpl.java:404)
    at com.liferay.portal.jsonwebservice.JSONWebServiceActionImpl.invoke(JSONWebServiceActionImpl.java:76)
    at com.liferay.portal.jsonwebservice.JSONWebServiceServiceAction.getJSON(JSONWebServiceServiceAction.java:69)
    at com.liferay.portal.struts.JSONAction.execute(JSONAction.java:76)
    at com.liferay.portal.servlet.JSONServlet.service(JSONServlet.java:63)
    at com.liferay.portal.jsonwebservice.JSONWebServiceServlet.service(JSONWebServiceServlet.java:61)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:119)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:188)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.security.sso.ntlm.internal.servlet.filter.NtlmPostFilter.processFilter(NtlmPostFilter.java:107)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:188)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.servlet.filters.uploadservletrequest.UploadServletRequestFilter.processFilter(UploadServletRequestFilter.java:99)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.servlet.filters.authverifier.AuthVerifierFilter.processFilter(AuthVerifierFilter.java:170)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.servlet.filters.jsoncontenttype.JSONContentTypeFilter.processFilter(JSONContentTypeFilter.java:42)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.sharepoint.SharepointFilter.processFilter(SharepointFilter.java:88)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.BaseFilter.processFilter(BaseFilter.java:142)
    at com.liferay.portal.servlet.filters.virtualhost.VirtualHostFilter.processFilter(VirtualHostFilter.java:260)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:188)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176)
    at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145)
    at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92)
    at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:394)
    at com.liferay.portal.servlet.filters.urlrewrite.UrlRewriteFilter.processFilter(UrlRewriteFilter.java:65)
    at com.liferay.portal.kernel.servlet.BaseFilter.doFilter(BaseFilter.java:48)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDoFilter(InvokerFilterChain.java:207)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:112)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:168)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:168)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.processDirectCallFilter(InvokerFilterChain.java:188)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:96)
    at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilter.doFilter(InvokerFilter.java:115)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.xerces.dom.DOMXSImplementationSourceImpl cannot be cast to org.w3c.dom.DOMImplementationSource
    at org.w3c.dom.bootstrap.DOMImplementationRegistry.newInstance(Unknown Source)
    at org.fit.pdfdom.PDFDomTree.writeText(PDFDomTree.java:188)
    ... 119 more
siddharthnaik7 commented 6 years ago

Hello @radkovo ,

Kindly put some light on this issue. Please, let me know if I am doing something wrong, or how can I get through this.

Thanks, Siddharth