miachm / SODS

A simple Java library for handle ODS (Open Document Spreadsheet, compatible with Excel and Libreoffice)
The Unlicense
74 stars 30 forks source link

Make `XMLInputFactory` configurable. #77

Closed tan9 closed 1 year ago

tan9 commented 1 year ago

We require the XMLInputFactory configuration capability as described in https://github.com/miachm/SODS/issues/39.

Context

Our application is deployed on JBoss EAP, which incorporates Woodstox as its StAX implementation. While attempting to parse .ods files with a Dtd declared in metadata.xml (which was generated by using JasperReports) like the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
    ...

We encountered the following exception:

com.github.miachm.sods.NotAnOdsException: (previously java.io.FileNotFoundException) /Manifest.dtd (No such file or directory)
 at [row,col {unknown-source}]: [2,92]

    at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
    at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
    at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:464)
    at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:326)
    at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
    at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
    at com.github.miachm.sods.XmlReaderInstanceEventImpl.nextElement(XmlReaderInstanceEventImpl.java:46)
    at com.github.miachm.sods.OdsReader.processContent(OdsReader.java:86)
      ...

Please note that the exception above occurred due to this StAX implementation trying to validate the XML against the missed DTD resource, and this issue has been encountered during the process of reading and parsing the .ods file.

Workaround

I can verify that when we follow the provided instructions to configure Woodstox not to perform validation, the .ods file can be successfully read:

--- a/src/com/github/miachm/sods/XmlReaderEventImpl.java    (revision d36c400d804f7eb4f7103addeb7533d6faa5042c)
+++ b/src/com/github/miachm/sods/XmlReaderEventImpl.java    (date 1693920428753)
@@ -13,6 +13,7 @@
     @Override
     public XmlReaderInstanceEventImpl load(InputStream in) throws IOException {
         try {
+            inputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
             reader = inputFactory.createXMLStreamReader(in);
             // Skip start of document
             try {

Discussion

Exploring ways to extend XMLInputFactory customization:

  1. One approach is to adopt the suggestion outlined in https://github.com/miachm/SODS/issues/39#issuecomment-1039036961
https://github.com/miachm/SODS/issues/39#issuecomment-1039036961
  1. Another strategy involves utilizing the ServiceLoader mechanism as demonstrated below:

        // Try to load XmlReader or use default XmlReaderEventImpl
        ServiceLoader<XmlReader> loader = ServiceLoader.load(XmlReader.class);
        Iterator<XmlReader> iterator = loader.iterator();
        if (iterator.hasNext()) {
            reader = iterator.next();
        } else {
            reader = new XmlReaderEventImpl();
        }
  2. Or we can simply apply the mentioned patch directly to set XMLInputFactory.SUPPORT_DTD to true. This adjustment will ensure that JasperReports-generated ODS files function correctly on JBoss EAP without affecting any existing usages.

  3. Are there any alternative approaches that should be considered?

github-actions[bot] commented 1 year ago

Thanks for your report! Please ensure you have provided enough info in order to recreate the issue, including the problematic ODS File.

miachm commented 1 year ago

Looks like the 3rd option is the best one. That's assuming this flag doesn't affect the existing usage. I guess not, otherwise we should be able to see it at unit tests.

Nevertheless, you said the flag should be set to true. But previously, in your workaround, you set the flag to false... Should it not be false?

tan9 commented 1 year ago

Looks like the 3rd option is the best one. That's assuming this flag doesn't affect the existing usage. I guess not, otherwise we should be able to see it at unit tests.

I've submitted a pull request to address this change: https://github.com/miachm/SODS/pull/78

Nevertheless, you said the flag should be set to true. But previously, in your workaround, you set the flag to false... Should it not be false?

A correction is needed: to disable validation, set it to false. Additionally, this prevents SODS from being vulnerable to an XXE attack.

tan9 commented 1 year ago

However, #78 causes #80.

tan9 commented 1 year ago

@miachm I've rebased the PR onto the master branch.