Closed akostajti closed 5 years ago
Please put the docx somewhere I can look at it.
sorry, I forgot it. here you can download the file: https://drive.google.com/file/d/0B6qA3QZEFwTKaXdlNE9PRGJhRVU/view?usp=sharing.
@plutext Any updates? I had a very similar issue with the latest version of docx4j:
Unhandled java.lang.NumberFormatException
For input string: "9576.0"
NumberFormatException.java: 65 java.lang.NumberFormatException/forInputString
Integer.java: 580 java.lang.Integer/parseInt
BigInteger.java: 470 java.math.BigInteger/<init>
BigInteger.java: 606 java.math.BigInteger/<init>
DatatypeConverterImpl.java: 76 com.sun.xml.internal.bind.DatatypeConverterImpl/_parseInteger
RuntimeBuiltinLeafInfoImpl.java: 779 com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22/parse
RuntimeBuiltinLeafInfoImpl.java: 777 com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22/parse
TransducedAccessor.java: 230 com.sun.xml.internal.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl/parse
StructureLoader.java: 195 com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader/startElement
UnmarshallingContext.java: 559 com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext/_startElement
UnmarshallingContext.java: 538 com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext/startElement
SAXConnector.java: 153 com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector/startElement
AbstractSAXParser.java: 509 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser/startElement
AbstractXMLDocumentParser.java: 182 com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser/emptyElement
XMLNSDocumentScannerImpl.java: 351 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl/scanStartElement
XMLDocumentFragmentScannerImpl.java: 2784 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver/next
XMLDocumentScannerImpl.java: 602 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl/next
XMLNSDocumentScannerImpl.java: 112 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl/next
XMLDocumentFragmentScannerImpl.java: 505 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl/scanDocument
XML11Configuration.java: 841 com.sun.org.apache.xerces.internal.parsers.XML11Configuration/parse
XML11Configuration.java: 770 com.sun.org.apache.xerces.internal.parsers.XML11Configuration/parse
XMLParser.java: 141 com.sun.org.apache.xerces.internal.parsers.XMLParser/parse
AbstractSAXParser.java: 1213 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser/parse
SAXParserImpl.java: 643 com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser/parse
UnmarshallerImpl.java: 243 com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl/unmarshal0
UnmarshallerImpl.java: 214 com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl/unmarshal
AbstractUnmarshallerImpl.java: 157 javax.xml.bind.helpers.AbstractUnmarshallerImpl/unmarshal
AbstractUnmarshallerImpl.java: 125 javax.xml.bind.helpers.AbstractUnmarshallerImpl/unmarshal
XmlUtils.java: 540 org.docx4j.XmlUtils/unmarshalString
XmlUtils.java: 589 org.docx4j.XmlUtils/unmarshallFromTemplate
JaxbXmlPart.java: 266 org.docx4j.openpackaging.parts.JaxbXmlPart/variableReplace
NativeMethodAccessorImpl.java: -2 sun.reflect.NativeMethodAccessorImpl/invoke0
NativeMethodAccessorImpl.java: 62 sun.reflect.NativeMethodAccessorImpl/invoke
DelegatingMethodAccessorImpl.java: 43 sun.reflect.DelegatingMethodAccessorImpl/invoke
Method.java: 498 java.lang.reflect.Method/invoke
Reflector.java: 93 clojure.lang.Reflector/invokeMatchingMethod
Reflector.java: 28 clojure.lang.Reflector/invokeInstanceMethod
...
Please post your docx at http://ndoc.it
Which version of docx4j?
Generally such issues are handled by the code at https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/jaxb/mc-preprocessor.xslt#L89
Another example attached.
In this case, it's triggered by the decimal value of 1.8
in w:space
:
<w:pBdr>
<w:top w:sz="7" w:space="1.8" w:color="#333437" w:val="single"/>
<w:left w:sz="7" w:space="0" w:color="#000000" w:val="single"/>
<w:bottom w:sz="3" w:space="7.2" w:color="#323539" w:val="double"/>
<w:right w:sz="7" w:space="0" w:color="#000000" w:val="single"/>
</w:pBdr>
According to the schema, w:space should be of type ST_PointMeasure
, and docx4j parses it as a BigInteger. So this document may actually be schematically invalid. However, tools open it fine (LibreWriter silently corrects the value; I haven't tested in Word). I do not know what tool generated this document.
Stack trace follows.
Caused by: java.lang.NumberFormatException: For input string: "1.8"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_212]
at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_212]
at java.math.BigInteger.<init>(BigInteger.java:470) ~[?:1.8.0_212]
at java.math.BigInteger.<init>(BigInteger.java:606) ~[?:1.8.0_212]
at com.sun.xml.bind.DatatypeConverterImpl._parseInteger(DatatypeConverterImpl.java:91) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22.parse(RuntimeBuiltinLeafInfoImpl.java:800) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22.parse(RuntimeBuiltinLeafInfoImpl.java:798) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.parse(TransducedAccessor.java:245) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:212) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:577) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:556) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:75) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:168) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:244) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:127) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:110) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:103) ~[jaxb-core-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.BinderImpl.associativeUnmarshal(BinderImpl.java:161) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at com.sun.xml.bind.v2.runtime.BinderImpl.unmarshal(BinderImpl.java:132) ~[jaxb-runtime-2.3.0.jar:2.3.0]
at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:574) ~[docx4j-6.0.1.jar:?]
at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:355) ~[docx4j-6.0.1.jar:?]
at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:194) ~[docx4j-6.0.1.jar:?]
... 27 more
Should be fixed by https://github.com/plutext/docx4j/commit/bc652c5bf945a8c62b18d1f02f16d3571d0ba677
Will be in a new release this week.
Anybody else who encounters a similar issue but on some other attribute, please open your own issue, clearly showing what XML structure is at issue.
I'm extracting text from a docx file using
TextUtils.extractText(Object o, Writer w)
. For a certain document (generated with an older version fo google docs) I get this exception:2015-06-21 05:55:14,999 ERROR openpackaging.parts.JaxbXmlPartXPathAware - For input string: "9360.0" [DefaultQuartzScheduler_Worker-10] {} java.lang.NumberFormatException: For input string: "9360.0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.math.BigInteger.<init>(BigInteger.java:338) at java.math.BigInteger.<init>(BigInteger.java:476) at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInteger(DatatypeConverterImpl.java:72) at com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$21.parse(RuntimeBuiltinLeafInfoImpl.java:766) at com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$21.parse(RuntimeBuiltinLeafInfoImpl.java:764) at com.sun.xml.internal.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.parse(TransducedAccessor.java:230) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:135) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:229) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:112) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:95) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:88) at com.sun.xml.internal.bind.v2.runtime.BinderImpl.associativeUnmarshal(BinderImpl.java:146) at com.sun.xml.internal.bind.v2.runtime.BinderImpl.unmarshal(BinderImpl.java:117) at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unwrapUsually(JaxbXmlPartXPathAware.java:283) at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:333) at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:147)
Is there a way to prevent this exception
?