plutext / docx4j

JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
https://www.docx4java.org/
2.1k stars 1.2k forks source link

OnlyOffice - w:sdtPr/w:label - unexpected element error #586

Closed maddin79 closed 2 months ago

maddin79 commented 2 months ago

Hello everyone,

I hope someone can help me with this issue. I can not find any solution to it.

OnlyOffice version: 8.1.0.169 Java version: JDK 21.0.2 docx4j-JAXB-MOXy version: 11.4.11 (The error is the same with other docx4j jaxb implementations)

I'm creating a docx document with OnlyOffice that contains content control field. Here is a simple example of the document.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
    xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash"
    xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
    mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh wp14">
    <w:body>
        <w:p>
            <w:pPr>
                <w:pBdr></w:pBdr>
                <w:spacing />
                <w:ind />
                <w:rPr></w:rPr>
            </w:pPr>
            <w:r></w:r>
            <w:r></w:r>
            <w:sdt>
                <w:sdtPr>
                    <w:alias w:val="FIELD" />
                    <w15:appearance w15:val="boundingBox" />
                    <w:label w:val="0" />
                    <w:lock w:val="contentLocked" />
                    <w:tag w:val="text" />
                    <w:rPr></w:rPr>
                </w:sdtPr>
                <w:sdtContent>
                    <w:r>
                        <w:rPr>
                            <w:color w:val="ffffff" />
                            <w:shd w:val="clear" w:color="0000ff" w:fill="0000ff" />
                        </w:rPr>
                        <w:t xml:space="preserve">iaeieie</w:t>
                    </w:r>
                    <w:r></w:r>
                </w:sdtContent>
            </w:sdt>
            <w:r></w:r>
            <w:r></w:r>
            <w:r></w:r>
        </w:p>
        <w:sectPr>
            <w:footnotePr></w:footnotePr>
            <w:endnotePr></w:endnotePr>
            <w:type w:val="nextPage" />
            <w:pgSz w:h="16838" w:orient="portrait" w:w="11906" />
            <w:pgMar w:top="1134" w:right="850" w:bottom="1134" w:left="1701" w:header="709"
                w:footer="709" w:gutter="0" />
            <w:cols w:num="1" w:sep="0" w:space="708" w:equalWidth="1"></w:cols>
        </w:sectPr>
    </w:body>
</w:document>

The problematic line is <w:label w:val="0" /> in <w:sdtPr>.

The java code looks like:

File doc = new File("/home/mdrees/Documents/Privat/test-doc3.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(doc);
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
SdtFinder sdtFinder = new SdtFinder();
org.docx4j.wml.Body  documentBody = mainDocumentPart.getJaxbElement().getBody();

The error happens on the last line with the follow stack trace:

09:46:22.276 [main] WARN org.docx4j.jaxb.JaxbValidationEventHandler -- [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
09:46:22.278 [main] WARN org.docx4j.jaxb.JaxbValidationEventHandler -- Column is 1860 at line number 1
09:46:22.279 [main] DEBUG org.docx4j.jaxb.JaxbValidationEventHandler -- shouldContinue is set to false
java.lang.Throwable: null
    at org.docx4j.jaxb.JaxbValidationEventHandler.handleEvent(JaxbValidationEventHandler.java:186)
    at org.eclipse.persistence.jaxb.JAXBErrorHandler.handleException(JAXBErrorHandler.java:88)
    at org.eclipse.persistence.jaxb.JAXBErrorHandler.warning(JAXBErrorHandler.java:54)
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startUnmappedElement(UnmarshalRecordImpl.java:1044)
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startElement(UnmarshalRecordImpl.java:883)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parseEvent(XMLStreamReaderReader.java:138)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:102)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:89)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:940)
    at org.eclipse.persistence.internal.oxm.XMLUnmarshaller.unmarshal(XMLUnmarshaller.java:695)
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:640)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:481)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:361)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:201)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getJaxbElement(JaxbXmlPart.java:221)
    at org.turnit.Main.findSdt(Main.java:36)
    at org.turnit.Main.main(Main.java:21)
09:46:22.282 [main] WARN org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware -- 
Exception Description: An error occurred unmarshalling the document
Internal Exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1860; unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
09:46:22.282 [main] ERROR org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware -- null
jakarta.xml.bind.UnmarshalException: null
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.handleXMLMarshalException(JAXBUnmarshaller.java:1136)
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:643)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:481)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:361)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:201)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getJaxbElement(JaxbXmlPart.java:221)
    at org.turnit.Main.findSdt(Main.java:36)
    at org.turnit.Main.main(Main.java:21)
Caused by: org.eclipse.persistence.exceptions.XMLMarshalException: 
Exception Description: An error occurred unmarshalling the document
Internal Exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1860; unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
    at org.eclipse.persistence.exceptions.XMLMarshalException.unmarshalException(XMLMarshalException.java:122)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.convertSAXException(SAXUnmarshaller.java:1045)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:948)
    at org.eclipse.persistence.internal.oxm.XMLUnmarshaller.unmarshal(XMLUnmarshaller.java:695)
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:640)
    ... 6 common frames omitted
Caused by: org.xml.sax.SAXParseException: unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startUnmappedElement(UnmarshalRecordImpl.java:1044)
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startElement(UnmarshalRecordImpl.java:883)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parseEvent(XMLStreamReaderReader.java:138)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:102)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:89)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:940)
    ... 8 common frames omitted
09:46:22.283 [main] ERROR org.docx4j.openpackaging.parts.JaxbXmlPart -- Problem with part /word/document.xml
org.docx4j.openpackaging.exceptions.Docx4JException: Problem with part /word/document.xml
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:204)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getJaxbElement(JaxbXmlPart.java:221)
    at org.turnit.Main.findSdt(Main.java:36)
    at org.turnit.Main.main(Main.java:21)
Caused by: jakarta.xml.bind.JAXBException: null
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:670)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:361)
    at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:201)
    ... 3 common frames omitted
Caused by: jakarta.xml.bind.UnmarshalException: null
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.handleXMLMarshalException(JAXBUnmarshaller.java:1136)
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:643)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:481)
    ... 5 common frames omitted
Caused by: org.eclipse.persistence.exceptions.XMLMarshalException: 
Exception Description: An error occurred unmarshalling the document
Internal Exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1860; unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
    at org.eclipse.persistence.exceptions.XMLMarshalException.unmarshalException(XMLMarshalException.java:122)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.convertSAXException(SAXUnmarshaller.java:1045)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:948)
    at org.eclipse.persistence.internal.oxm.XMLUnmarshaller.unmarshal(XMLUnmarshaller.java:695)
    at org.eclipse.persistence.jaxb.JAXBUnmarshaller.unmarshal(JAXBUnmarshaller.java:640)
    ... 6 common frames omitted
Caused by: org.xml.sax.SAXParseException: unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"label"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}placeholder>,<{http://schemas.microsoft.com/office/word/2012/wordml}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}alias>,<{http://schemas.microsoft.com/office/word/2010/wordml}checkbox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}showingPlcHdr>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionCreated>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}comboBox>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}citation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dataBinding>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}temporary>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSectionItem>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tag>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}bibliography>,<{http://schemas.microsoft.com/office/word/2010/wordml}entityPicker>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lock>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}text>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date>,<{http://schemas.microsoft.com/office/word/2012/wordml}color>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}equation>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}docPartObj>,<{http://schemas.microsoft.com/office/word/2012/wordml}webExtensionLinked>,<{http://schemas.microsoft.com/office/word/2012/wordml}appearance>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dropDownList>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}picture>,<{http://schemas.microsoft.com/office/word/2012/wordml}repeatingSection>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}group>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}richText>
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startUnmappedElement(UnmarshalRecordImpl.java:1044)
    at org.eclipse.persistence.internal.oxm.record.UnmarshalRecordImpl.startElement(UnmarshalRecordImpl.java:883)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parseEvent(XMLStreamReaderReader.java:138)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:102)
    at org.eclipse.persistence.internal.oxm.record.XMLStreamReaderReader.parse(XMLStreamReaderReader.java:89)
    at org.eclipse.persistence.internal.oxm.record.SAXUnmarshaller.unmarshal(SAXUnmarshaller.java:940)
    ... 8 common frames omitted
java.lang.NullPointerException: Cannot invoke "org.docx4j.wml.Document.getBody()" because the return value of "org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart.getJaxbElement()" is null
    at org.turnit.Main.findSdt(Main.java:36)
    at org.turnit.Main.main(Main.java:21)

According to the documentation about open xml format, the label is allowed on that position. How can I fix this issue? I quite new to docx4j.

Best Martin

plutext commented 2 months ago

w:label is not supported in https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-openxml-objects/src/main/java/org/docx4j/wml/SdtPr.java

You could add it in there manually, and in https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-openxml-objects/src/main/java/org/docx4j/wml/ObjectFactory.java

(Normally we'd do that by providing for it in the xsd, then regenerating the code using xjc, but manually editing works fine as well)

Or if w:label serves no purpose, you could remove it from your document using https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-samples-resources/src/main/resources/custom-preprocessor.xslt

docx4j uses ECMA-376, first edition (as opposed to 2ed, which was not then available) plus various amendments. It seems that label is defined in §17.5.2.19 of [ISO/IEC 29500-1 1st Edition] according to https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.sdtproperties?view=openxml-3.0.1 but afaik Microsoft Word doesn't write it.

maddin79 commented 2 months ago

Hello @plutext,

thank you very much for the hints. The label field has indeed no purpose for me. I don't know why OnlyOffice adds it. I create the fields with an plugin and the API of OnlyOffice. The weird thing is, if I add a field into a table, the label is not there, but the code for the field is the same.

I check out the preprocessor. Looks promising.

maddin79 commented 2 months ago

I added in src/main/java/resources the docx4j.properties and the custom-preprocessor.xslt in my maven project. But I looks like they are not taken into account. I also can not find documentation about this feature nor how to customize the preprocessor. How does it work and how do I remove the field from the document?

plutext commented 2 months ago

Do you have a line like https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-core/pom.xml#L192 in your pom.xml

To your custom-preprocessor.xslt you need to add the line:

<xsl:template match="w:label" />  

If the mechanism is working, it will be invoked from https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-core/src/main/java/org/docx4j/openpackaging/parts/JaxbXmlPartXPathAware.java#L574

maddin79 commented 2 months ago

Hi @plutext,

I added the resource line and I also had the w:label in the xslt. But the ouput does not change. I run this with Intellij and also build an executable jar with all dependencies. The xslt is not taking into account at all. No idea what I'm doing wrong.

plutext commented 2 months ago

Seems like you have logging configured; so you have some resources being read? Configure it for INFO level logging; do you see the log at line 574 above? If you still can't get it working, you could create a small test case maven project and I will look at it for you.

maddin79 commented 2 months ago

Sorry for the late reply. I did some debugging and found the setting

String transformParts = Docx4jProperties.getProperty("docx4j.jaxb.preprocess.always");
        boolean transformFirst = (transformParts!=null 
                && transformParts.contains(this.getClass().getSimpleName()));

and put

docx4j.jaxb.preprocess.always=MainDocumentPart

in the docx4j.properties. Works perfectly. I did not find the root cause why the xslt is not taken into account, when the unmarshaling fails.

Thanks for the help. For me it is fixed. I think to preprocess the file always is in my case better anyway. The documents will always come from OnlyOffice.