wmo-im / iwxxm

XML schema and Schematron for aviation weather data exchange
https://old.wmo.int/wiswiki/tiki-index.php%3Fpage=TT-AvXML
48 stars 22 forks source link

Add schema/Schematron checks to ensure that extended content always has a web accessible schema definition #29

Closed braeckel closed 4 years ago

braeckel commented 6 years ago

Add checks to ensure that all extended content blocks have a web-accessible XML schema. Without this check it is possible to leave off the schemaLocation and the validator will not attempt to validate extended content without a schema. This is a safety issue in that the extended content essentially cannot be validated and has no description (including human-readable description) for consumers to use. There are two approaches:

1) Use processContents="strict" on all extension elements. This may have the desired behavior but more research required. This may require making use of the xs:any element instead of the current anyType

2) Add a Schema or Schematron check to ensure that the top-level element under each extension element has an xsi:schemaLocation for that top-level element’s namespace. For example, iwxxm-us elements should have an xsi:schemaLocation on each element that indicates the location for the iwxxm-us namespace. If a schemaLocation is set then validation will ensure that it is web-accessible and correct. This would not guarantee that all contents are validated, just the top element, but if this was extended for all namespaces under the extension element might get tricky to implement (increasing the need for a namespace "whitelist", for example).

blchoy commented 6 years ago

I am not an expert in validators but mine (Oxygen XML Editor) always complain if a name space is not defined and if defined when its schema location is not accessible.

In reality, while most of us will read IWXXM reports close to their generation times, there are always times when we need to read these reports much longer afterwards (e.g. investigation). Therefore we need a mechanism to preserve the schemas referenced by the extensions. Whether this mechanism is owned by WMO/ICAO is another story but being regulated by either one or both of them could be a choice.

braeckel commented 6 years ago

The validator will always ensure that namespace prefixes are defined but if processContents="strict" is used it will ensure that the schemaLocations are also valid.

For an example take the following snippet and put it at the bottom of any IWXXM message. It validates against 2.x schemas in both Oxygen and Crux despite not having a schema location defined:

<iwxxm:extension xmlns:foo="http://i-do-not-exist.invalid">
  <foo:content>It works!</foo:content>
</iwxxm:extension>

However I just did a test with SIGMET and changed the existing extension element definitions from:

<element name="extension" minOccurs="0" maxOccurs="unbounded" type="anyType">
  <annotation>
     <documentation>Extension block for optional and/or additional parameters for element SIGMET</documentation>
    </annotation>
</element>

to the following custom ExtensionType:

<element name="extension" minOccurs="0" maxOccurs="unbounded" type="iwxxm:ExtensionType">
    <annotation>
      <documentation>Extension block for optional and/or additional parameters for element SIGMET</documentation>
    </annotation>
</element>

<complexType name="ExtensionType">
  <sequence>
    <any processContents="strict"/>
  </sequence>
</complexType>

And this updated version properly causes a validation failure because the schema could not be located. I think the processContents="strict" on is a great solution for this issue.

blchoy commented 6 years ago

Noted and agreed. However, some documentation (https://www.w3schools.com/xml/el_any.asp) said processContents="strict" should be the default which is obviously not always true. :)

braeckel commented 6 years ago

I also saw mentions of that being the default behavior, but in testing with Oxygen and Crux (both are Xerces Java) that doesn't seem to be the case.

On Mon, Nov 20, 2017 at 9:45 AM, BL Choy notifications@github.com wrote:

Noted and agreed. However, some documentation (https://www.w3schools.com/ xml/el_any.asp) said processContents="strict" should be the default. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wmo-im/iwxxm/issues/29#issuecomment-345754905, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgcQYBt4S8G0bfJx4bHyK39OBlKu__pks5s4ay0gaJpZM4PxwH4 .

braeckel commented 6 years ago

The most sensible approach would seem to be:

<complexType name="ExtensionType">
  <sequence>
    <any processContents="strict"/>
  </sequence>
</complexType>
<element name="extension" minOccurs="0" maxOccurs="unbounded" type="iwxxm:ExtensionType">
    <annotation>
      <documentation>Extension block for optional and/or additional parameters for element XYZ</documentation>
    </annotation>
</element>
blchoy commented 6 years ago

The EA post-processing XSLT scripts have been modified and committed to the SVN.

blchoy commented 6 years ago

I am closing this as I see there is no further comment from Aaron on my implementation of the EA post-processing XSLT scripts.

blchoy commented 6 years ago

Re-opened this issue as at MIE/4 there were queries whether this was overly restrictive (location of external scehmas should be available and accessible online).

braeckel commented 6 years ago

When the strict behavior is enabled without a resolvable schema location Xerces returns an error that says: "Failed to read schema document 'http://host.org/schema/iwxxm-ext/1.0/iwxxm-ext.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not .'

braeckel commented 6 years ago

In TT-AvXML/7 it was agreed that comments on this strict behavior would be sought and perhaps loosened in 3.0RCfinal

blchoy commented 6 years ago

No active complaints so far but we still need to do the research on how this works on AIXM which is an action item of MIE/4. This won't probably be resolved in 3.0.

Updated on 1 May 2019: Further discussions have been made in the joint WG-MIE/WG-MRI workshop in Nov 2018 and will continue in WG-MIE/5 in May 2019.

blchoy commented 4 years ago

At both WG-MIE/5 and TT-AvXML-8 there were general consensus in this arrangement. This issue is hereby closed.