plutext / docx4j-ImportXHTML

Converts XHTML to OpenXML WordML (docx) using docx4j
135 stars 124 forks source link

NPE when converting HTML nested lists to pptx #92

Open phinc opened 1 year ago

phinc commented 1 year ago

NPE is thrown when basic html list with nesting is converted into pptx. Affected versions: 11.4.8, 11.4.6 Error: Caused by: java.lang.NullPointerException: null at org.pptx4j.convert.in.xhtml.ListHelper.addNumbering(ListHelper.java:220) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.processParagraph(XHTMLtoPPTX.java:385) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseBlockBox(XHTMLtoPPTX.java:335) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseChild(XHTMLtoPPTX.java:272) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseChildren(XHTMLtoPPTX.java:225) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseBlockBox(XHTMLtoPPTX.java:341) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseChild(XHTMLtoPPTX.java:272) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverseChildren(XHTMLtoPPTX.java:225) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.traverse(XHTMLtoPPTX.java:214) at org.pptx4j.convert.in.xhtml.XHTMLtoPPTX.convertSingleSlide(XHTMLtoPPTX.java:198)

Input html: `

  1. Coffee
  2. Tea
    • Black tea
    • Green tea
  3. Milk

`

PS: I've tried to fix NPE by simple null check and ended up with another issue happening in the next portion of code `// Process XHTML XHTMLtoPPTX converter = new XHTMLtoPPTX(presentationMLPackage, slidePart, xmlText, baseUrl); List results = converter.convertSingleSlide();

            // Add results to slide
            slidePart.getJaxbElement()
                    .getCSld()
                    .getSpTree()
                    .getSpOrGrpSpOrGraphicFrame()
                    .addAll(results);

            Document doc = XmlUtils.marshaltoW3CDomDocument(slidePart.getJaxbElement(), slidePart.getJAXBContext(),
                    slidePart.getMcChoiceNamespaces());`

It seems to be caused by incorrect Box hierarchy processing that creates extra paragraphs entries in the results objects. `java.lang.RuntimeException: jakarta.xml.bind.MarshalException

  • with linked exception: [com.sun.istack.SAXException2: A cycle is detected in the object graph. This will cause infinitely deep XML: org.pptx4j.pml.Shape@596cfd93 -> org.docx4j.dml.CTTextParagraph@33ed860b -> org.docx4j.dml.CTTextBody@60e5a926 -> org.pptx4j.pml.Shape@596cfd93]`