opensagres / xdocreport

XDocReport means XML Document reporting. It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).
https://github.com/opensagres/xdocreport
1.23k stars 375 forks source link

SAXParseException reading template document with Freemarker angle brackets #398

Open matthiasbasler opened 4 years ago

matthiasbasler commented 4 years ago

We are using XDocreport 2.0.1 in order to parse "docx" files and fill out certain fields - the typical "Mail Merge" functionality. We use the "xwpf" converter to finally create a PDF out of it. So far we have been using the square bracket Freemarker syntax, e.g. [#if ...] [/#if] and this worked. Since we are using the angle bracket Freemarker Syntax in the rest of our application we wanted to switch XDocreport over as well. There is a configuration setting to do so. So we set

final IXDocReport report = XDocReportRegistry.getRegistry().loadReport(stream, TemplateEngineKind.Freemarker, false);
final Configuration fmConfig = new Configuration(Configuration.VERSION_2_3_28);
fmConfig.setTagSyntax(Configuration.ANGLE_BRACKET_TAG_SYNTAX);
((FreemarkerTemplateEngine) report.getTemplateEngine()).setFreemarkerConfiguration(fmConfig);

Of course we changed the syntax of the template .docx file as well, e.g. <#if applicant_houseNumber?hasContent> ${applicant_houseNumber}</#if>

Afterwards, we get a SAXParseException before XDocReport even reaches our template model. As far as I can conclude from the stack trace (see below) and the parser state when the exception is thrown, the Xerces SAX parser tries to evaluate the mergefield content <#if ...> and chokes on this because it thinks "<#" is not a valid XML tag. It certainly isn't, but the document parser should not even try to parse the content of a merge field as XML imho.

Unfortunately there is little to find regarding the Configuration.ANGLE_BRACKET_TAG_SYNTAX flag on the web, so I wonder if I overlooked something or whether this is a bug.


The stack trace (abbreviated to relevant classes) is as follows:

fr.opensagres.xdocreport.converter.XDocConverterException: java.io.IOException: Unable to parse xml bean
    at fr.opensagres.xdocreport.converter.docx.poi.itext.XWPF2PDFViaITextConverter.convert(XWPF2PDFViaITextConverter.java:72) ~[fr.opensagres.xdocreport.converter.docx.xwpf-2.0.1.jar:2.0.1]
    at fr.opensagres.xdocreport.document.AbstractXDocReport.convert(AbstractXDocReport.java:713) ~[fr.opensagres.xdocreport.document-2.0.1.jar:2.0.1]
    ... 96 more
Caused by: java.io.IOException: Unable to parse xml bean
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:166) ~[poi-ooxml-3.17.jar:3.17]
    at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source) ~[ooxml-schemas-1.3.jar:?]
    at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:152) ~[poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169) ~[poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:119) ~[poi-ooxml-3.17.jar:3.17]
    at fr.opensagres.xdocreport.converter.docx.poi.itext.XWPF2PDFViaITextConverter.convert(XWPF2PDFViaITextConverter.java:66) ~[fr.opensagres.xdocreport.converter.docx.xwpf-2.0.1.jar:2.0.1]
    at fr.opensagres.xdocreport.document.AbstractXDocReport.convert(AbstractXDocReport.java:713) ~[fr.opensagres.xdocreport.document-2.0.1.jar:2.0.1]
    ... 96 more
Caused by: org.xml.sax.SAXParseException: Der Content von Elementen muss aus ordnungsgemäß formatierten Zeichendaten oder Markups bestehen.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.startOfMarkup(XMLDocumentFragmentScannerImpl.java:2635) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2732) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) ~[?:1.8.0_212]
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) ~[?:1.8.0_212]
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) ~[?:1.8.0_212]
    at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140) ~[poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:163) ~[poi-ooxml-3.17.jar:3.17]
    at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source) ~[ooxml-schemas-1.3.jar:?]
    at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:152) ~[poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169) ~[poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:119) ~[poi-ooxml-3.17.jar:3.17]
    at fr.opensagres.xdocreport.converter.docx.poi.itext.XWPF2PDFViaITextConverter.convert(XWPF2PDFViaITextConverter.java:66) ~[fr.opensagres.xdocreport.converter.docx.xwpf-2.0.1.jar:2.0.1]
matthiasbasler commented 4 years ago

I changed the FreeMarker syntax in the template back to square brackets and to my surprise the PDF document was again created well. Which means, that the Configuration.ANGLE_BRACKET_TAG_SYNTAX flag has no effect. Further investigation shows that fr.opensagres.xdocreport.template.freemarker.FreemarkerTemplateEngine.setFreemarkerConfiguration(Configuration) overwrites my tag syxtax flag with Configuration.SQUARE_BRACKET_TAG_SYNTAX, so no surprise it isn't working.

Can anyone please explain to me why there is a FM syntax configuration flag if the API overwrites it to whatever it considers right?

I have following suggestions here:

  1. Please document the above fact in your official documentation. I wasted 3 hours trying to figure out why the API would not cope with my document until I found out that it silently overwrites the setting. This is counterintuitive and must be well documented imho.
  2. Please clearly document what happens if the ANGLE_BRACKET_TAG_SYNTAX setting is forced nonetheless by reversing the order of the statements like this ...
    final Configuration fmConfig = new Configuration(Configuration.VERSION_2_3_28);
    ((FreemarkerTemplateEngine) report.getTemplateEngine()).setFreemarkerConfiguration(fmConfig);
    // Set the flag afterwards, so it really gets respected
    fmConfig.setTagSyntax(Configuration.ANGLE_BRACKET_TAG_SYNTAX);

    In this case I get a stange error about Freemarker failing to parse the expression "${___info.imageId}" - no idea where this comes from, but certainly not from the content of my template.

matthiasbasler commented 4 years ago

Sorry, accidentially closed -> reopened.