hapifhir / org.hl7.fhir.core

Apache License 2.0
155 stars 157 forks source link

Insecure Transformer used in XmlParsers #1571

Closed qligier closed 1 month ago

qligier commented 7 months ago

The Transformer used in XmlParsers (all versions) is insecure to XEE attacks (r5 shown here):

https://github.com/hapifhir/org.hl7.fhir.core/blob/dd9dd6b64aca4645aa3a444d011a6930d3acf58c/org.hl7.fhir.r5/src/main/java/org/hl7/fhir/r5/elementmodel/XmlParser.java#L146-L147

See https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#transformerfactory

The current Saxon-HE version (9.8) does not support XMLConstants.ACCESS_EXTERNAL_DTD or XMLConstants.ACCESS_EXTERNAL_SCHEMA (java.lang.IllegalArgumentException: Unknown configuration property http://javax.xml.XMLConstants/property/accessExternalDTD).

One way forward is to upgrade Saxon-HE to the latest version (9.8 → 12), which is anyway a good thing to do.

Another way is to use newDefaultInstance() instead of newInstance(), which returns the implementation provided by the JDK (Xalan) instead of Saxon.

Also, the use of a utility class that would configure XML factories would be nice, instead of reconfiguring them at each use (kind of what I did in Husky: https://github.com/project-husky/husky/blob/develop/husky-common/husky-common-gen/src/main/java/org/projecthusky/common/utils/xml/XmlFactories.java).

I can provide a PR if you agree with these solutions.

grahamegrieve commented 7 months ago

@dotasek I don't think that there's any special functionality here - I don't know why it's even using saxon at this point?

dotasek commented 2 months ago

@qligier Could you take a look at #1717 and verify that it is the correct fix?

dotasek commented 2 months ago

I am all for removing saxon-he, but as a separate task. I would have to figure out what the impact is on IG publisher, as I think that dependency has caused some issues for users before.

qligier commented 2 months ago

@dotasek Looks good to me, thanks!

dotasek commented 2 months ago

@qligier @grahamegrieve so there's an issue with using both Saxon HE 12.5, and with using Xalan.

The Problem

Xalan breaks XSLT expectations

I recalled correctly; we briefly defaulted to a non-Saxon implementation when HAPI dropped its own Saxon dependency, and I believe we were using Xalan. The problem was, Xalan was adding namespace prefixes to all XSLT transforms (ns0 below):

<ns0:p xmlns:ns0="http://www.w3.org/1999/xhtml">
    This is FHIR Implementation Guide for UNICOM project, created to assist work with pilot product list product data in FHIR.
  </ns0:p>

I could not find a way to turn that behaviour off, so I restored the Saxon dependency.

Saxon 12.x breaks our line reporting

For Saxon HE, we DO in fact rely on implementation details. The following code pulls line and column from the error messages, which are implementation specific:

https://github.com/hapifhir/org.hl7.fhir.core/blob/48741cfd97d0536e6a54ac4e10b066724a805c6b/org.hl7.fhir.r5/src/main/java/org/hl7/fhir/r5/elementmodel/XmlParser.java#L174-178

Without the same error format, we lose line and column reporting.

Unfortunately, these two issues mean we can't jump to 12.5 of Saxon or default to Xalan without fixing either the error messages, or fixing the namespace issue.

The Proposed Compromise

I propose we use 11.6 of Saxon, as I updated in my PR. This change had only one small breaking test because the message format didn't change that much. I propose moving to this version, and then going back and figuring out how to fix our dependence on either implementation.