wmo-im / iwxxm

XML schema and Schematron for aviation weather data exchange
https://old.wmo.int/wiswiki/tiki-index.php%3Fpage=TT-AvXML
48 stars 22 forks source link

Making schematron rules validate multiple version of IWXXM #214

Open blchoy opened 4 years ago

blchoy commented 4 years ago

This is a follow up to #162, in particular on the identified actions.

After some tests, it is proposed to:

  1. Remove COLLECT.MB1 from iwxxm.sch, as we want it to handle IWXXM reports only.
  2. For each version of IWXXM, create iwxxm-collect.sch which contains COLLECT.MB1, rules of the current version and all previously published versions. Rules of each version will be identified by the respective namespace and namespace prefixes (e.g. iwxxm-30 for IWXXM version 3.0).

So to verify an IWXXM report with schematron, one should use iwxxm.sch of the respective version. To verify an IWXXM collective, one should use iwxxm-collect.sch of the latest version.

The only question left is whether iwxxm-collect.xsd is still required. For completeness (against the presence of iwxxm-collect.sch) it should be, and it should also import COLLECT and also all previously published IWXXM schemas.

Scripts will be used to generate iwxxm-collect.sch and iwxxm-collect.xsd from current and previous IWXXM versions.

Views welcome.

blchoy commented 4 years ago

The following is a first cut of iwxxm-collect.sch which should valid IWXXM 2.1, IWXXM 3.0 and their collectives with COLLECT:

iwxxm-collect.zip

Noticing there are a number of schematron bugs discovered in IWXXM 3.0, the team may need to make a decision whether we want to issue a patch to 3.0 first before upgrading to 3.1 to accommodate the changes required by Am79 or otherwise. The decision shall also be promulgated via ICAO to States to facilitate their implementation.

Feedbacks are welcomed.

blchoy commented 4 years ago

As a matter of interest, I have also created a schematron rule to count the number of reports in a collective in accordance to their version (i.e. namespace) and type. Feel free to try.

CountReportInCollective.zip

blchoy commented 4 years ago

From @jkorosi:

Dear Choy, all,

I don't know if this is the right place to ask it. I hope it is.

The multiple version enhancement remind me the topic about self-containing report. I think we discussed it several years ago, but I am not sure if it is stated in some official document. I mean the fact that the report should always define all namespaces which are used in it, even if it is a part of the collective.

The benefits are:

  1. When you create compilation, you will not need to change the original report. That is the practice in TAC world, that you don't modify the report which is not created by you.
  2. It should avoid the errors when some namespaces are defined on collection element and then redefined on some report elements in this collective. The most tricky case is when all reports redefined all namespaces except the last one. And also in this last one some of the namespaces are rededined and some aren't. I have already analyzed such real-world case.

I know that this is not a usual practice in the XML world, but XML is designed in a general way. Maybe IWXXM can be more restrictive.

Regards, Jan

blchoy commented 4 years ago

To re-phrase Jan's suggestion, we may want to ask users to adopt a practice to define IWXXM namespace as something like:

xmlns:iwxxm30="http://icao.int/iwxxm/3.0"

which should be invariant under single report or multiple reports environment.

ilkkarinne commented 4 years ago

Hi all, perhaps what Jan is suggesting is that each report contained within a collection should independently declare all the namespaces of the elements it contains, as well as their prefixes to be used locally within that report element. Thus the IWXXM version specific namespaces would not have to be defined at the collection root element level, and indeed doing so should discouraged.

XML schema rules allow defining namespaces locally and it this case it would make sense IMHO. Also having the same namespace defined several times within an XML document, with the same or different prefix should not be problem for any decent XML processing software.

jkorosi commented 4 years ago

Not just IWXXM namespace but I think all namespaces. Lets imaging that you receive IWXXM report and want to create a collective from it. If we include this report into collective untouched then it is ok, even you combine several IWXXM versions. But different IWXXM version may require different AIXM, GML, .. version. I think it would be beneficial to define all necessary namespace even they are already defined in collection. This would allow you to cut a report from one collective and put it to your own collective without any manipulation of the original report.

Another case is when you create a collective at source then it can happen that one wants to save some bytes and put all namespaces into the collection element and not on each report element. Again, this is a complication for all who wants to use your report in collectives.

To be honest, I also like the idea to define fixed namespaces schemas names as you suggested. Again it is not the common XML thing. On the other hand, such an approach is already applied to WMO Core Profile. @efucile can shed light on this.

ilkkarinne commented 4 years ago

@jkorosi: I agree with the usefulness for defining all the namespaces used in each report within the report parent element, even at the cost of redundant information.

Personally I would be careful in mandating the use of particular namespace prefixes as there may be cases where it is necessary to distinguish two minor schema versions having two different namespaces without making the prefixes too long. Mandating particular prefixes also may make the implementers careless in defining and requiring the use of correct full namespaces.

blchoy commented 4 years ago

xmlns:iwxxm30="http://icao.int/iwxxm/3.0"

I can only say this is a good practice but not necessarily a best practice.

We used to have a discussion on a similar topic, that whether gml:id can be used as a global unique identifier of an object. The answer is probably not desirable, because gml:id is local and making a local identifier global may introduce other issues. Ditto namespace prefixes.

jkorosi commented 4 years ago

@ilkkarinne that is what I wanted to express. Thank you.

For better illustration this is how collective should look like by my opinion:

<?xml version="1.0" encoding="UTF-8"?>
<collect:MeteorologicalBulletin
    xmlns:collect="http://def.wmo.int/collect/2014"
    xmlns:gml="http://www.opengis.net/gml/3.2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://def.wmo.int/collect/2014 http://schemas.wmo.int/collect/1.2/collect.xsd"
    gml:id="uuid.fee171e9-1aec-4815-bcca-9e3c467388c6">
    <collect:meteorologicalInformation>
        <iwxxm:METAR
            xmlns:iwxxm="http://icao.int/iwxxm/3.1"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xmlns:aixm="http://www.aixm.aero/schema/5.1.1"
            xsi:schemaLocation="http://icao.int/iwxxm/3.1 http://schemas.wmo.int/iwxxm/3.1/iwxxm.xsd
http://www.aixm.aero/schema/5.1.1 http://www.aixm.aero/schema/5.1.1_profiles/AIXM_WX/5.1.1b/AIXM_Features.xsd"
            gml:id="uuid.636569b5-6b9b-4ed7-aa60-ad373a9e372b"
            reportStatus="NORMAL"
            permissibleUsage="OPERATIONAL">

            ... IWXXM METAR 3.1.0 content ...

        </iwxxm:METAR>
    </collect:meteorologicalInformation>
    <collect:meteorologicalInformation>
        <iwxxm:METAR
            xmlns:iwxxm="http://icao.int/iwxxm/3.0"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xmlns:aixm="http://www.aixm.aero/schema/5.1.1"
            xsi:schemaLocation="http://icao.int/iwxxm/3.0 http://schemas.wmo.int/iwxxm/3.0/iwxxm.xsd
http://www.aixm.aero/schema/5.1.1 http://www.aixm.aero/schema/5.1.1/AIXM_Features.xsd"
            gml:id="uuid.636569b5-6b9b-4ed7-aa60-ad373a9e372b"
            reportStatus="NORMAL"
            permissibleUsage="OPERATIONAL">

            ... IWXXM METAR 3.0.1 content ...

        </iwxxm:METAR>
    </collect:meteorologicalInformation>
    <collect:bulletinIdentifier>A_LAYU31YUSO101000_C_YUSO_20120810100000.xml</collect:bulletinIdentifier>
</collect:MeteorologicalBulletin>

Also the Schematrom example provided by Choy few comments above handles namespaces above:

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
   <sch:title>Schematron validation</sch:title>
   <sch:ns prefix="xlink" uri="http://www.w3.org/1999/xlink"/>
   <sch:ns prefix="xsi" uri="http://www.w3.org/2001/XMLSchema-instance"/>
   <sch:ns prefix="gml" uri="http://www.opengis.net/gml/3.2"/>
   <sch:ns prefix="aixm" uri="http://www.aixm.aero/schema/5.1.1"/>
   <sch:ns prefix="metce" uri="http://def.wmo.int/metce/2013"/>
   <sch:ns prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
   <sch:ns prefix="skos" uri="http://www.w3.org/2004/02/skos/core#"/>
   <sch:ns prefix="reg" uri="http://purl.org/linked-data/registry#"/>
   <sch:ns prefix="sf" uri="http://www.opengis.net/sampling/2.0"/>
   <sch:ns prefix="sams" uri="http://www.opengis.net/samplingSpatial/2.0"/>
   <sch:ns prefix="om" uri="http://www.opengis.net/om/2.0"/>
   <sch:ns prefix="opm" uri="http://def.wmo.int/opm/2013"/>
   <sch:ns prefix="collect" uri="http://def.wmo.int/collect/2014"/>
   <sch:pattern id="COLLECT.MB1">
      <sch:rule context="//collect:MeteorologicalBulletin">
         <sch:assert test="count(distinct-values(for $item in //collect:meteorologicalInformation/child::* return(local-name($item))))eq 1">COLLECT.MB1: All meteorologicalInformation instances in MeteorologicalBulletin must be of the same type</sch:assert>
      </sch:rule>
   </sch:pattern>
   <sch:ns prefix="iwxxm30" uri="http://icao.int/iwxxm/3.0"/>
   <sch:pattern id="iwxxm30.METAR_SPECI.AerodromeRunwayState-1">
      <sch:rule context="//iwxxm30:AerodromeRunwayState">
         <sch:assert test="( if( @allRunways = 'true' ) then( empty(iwxxm30:runway) ) else( true() ) )">iwxxm30.METAR_SPECI.AerodromeRunwayState-1: When all runways are being reported upon, no specific runway should be reported</sch:assert>
      </sch:rule>
   </sch:pattern>

    ...

   <sch:pattern id="iwxxm30.IWXXM.ExtensionAlwaysLast">
      <sch:rule context="//iwxxm30:extension">
         <sch:assert test="following-sibling::*[1][self::iwxxm30:extension] or not(following-sibling::*)">iwxxm30.IWXXM.ExtensionAlwaysLast: Extension elements should be the last elements in their parents</sch:assert>
      </sch:rule>
   </sch:pattern>

   <sch:ns prefix="iwxxm21" uri="http://icao.int/iwxxm/2.1"/>
   <sch:pattern id="iwxxm21.METAR_SPECI.ARS1">
      <sch:rule context="//iwxxm21:AerodromeRunwayState">
         <sch:assert test="(if(@allRunways eq 'true') then( empty(iwxxm21:runway) ) else true())">iwxxm21.METAR_SPECI.ARS1: When all runways are being reported upon, no specific Runway should be reported</sch:assert>
      </sch:rule>
   </sch:pattern>

    ...

   <sch:pattern id="iwxxm21.IWXXM.ExtensionAlwaysLast">
      <sch:rule context="//iwxxm21:extension">
         <sch:assert test="following-sibling::*[1][self::iwxxm21:extension] or not(following-sibling::*)">iwxxm21.IWXXM.ExtensionAlwaysLast: Extension elements should be the last elements in their parents</sch:assert>
      </sch:rule>
   </sch:pattern>
</sch:schema>

@ilkkarinne is probably right that at least patch version is not included in XSD or Schematron. Even for differnet versions of IWXXM, different versions of AIXM is required as you can see in my example. You can "recommend" required version of AIXM in xsi:schemaLocation attribute. But it is not mandatory attribute. Also the consumer of the XML does not need to use it. It is up to him/her. I guess that the reason is you want to use the catalog for offline validation.

moryakovdv commented 4 years ago

xmlns:iwxxm30="http://icao.int/iwxxm/3.0"

I can only say this is a good practice but not necessarily a best practice.

Hi, all. Actually we can split version numbers by underscore sign, e.g. iwxxm3_1_0. Oracle's best practice allows it, as far as I understood. But my main concern is about performance issues we may be faced if we mix several xmlns for different versions in one schema. Have anybody got an experience of this?

ilkkarinne commented 4 years ago

@jkorosi: the schemaLocation is not an identifier of a namespace and should not be used as one. I strongly prefer using different namespaces for different schemas even though the change would be backward compatible, unless there is a very strong reason to not do it. I have personally already struggled with the "5.1.1" and "5.1.1b" AIXM profiles in FMI's IWXXM/KNMI implementation library because of this very reason: both declare the same namespace http://www.aixm.aero/schema/5.1.1. In software applications this results in issues when the old schema version is used to parse contents created with the new version. As the namespace is the schema identifier and not the schemaLocation, systems cannot determine which version they should use.

@moryakovdv: yes, you can use underscores as part of the namespace prefix, but the document you referred is not relevant to this as far as I can tell.

blchoy commented 4 years ago

@jkorosi: For better illustration this is how collective should look like by my opinion

This is exactly our preferred way of creating a collective.

@ilkkarinne: I have personally already struggled with the "5.1.1" and "5.1.1b" AIXM profiles in FMI's IWXXM/KNMI implementation library...

This was followed in #199. Our solution is to adopt the full AIXM schema as the similarly named AIXM elements in the full schema and weather profile could be different and this may cause confusion even if they are properly identified by different name spaces. This will be changed in IWXXM 3.0.1 real soon.

jkorosi commented 4 years ago

@blchoy: Yes, you are right. It is mentioned in the section Content of COLLECT documents. I have it in my mind that it is like this. I should better look for before I ask. Also, examples are created with respect to this.

For some reason, I expected that it would be mentioned e.g. in FT2019-2 WMO-306 Vol I.3.docx. Maybe it should be also mentioned at IWXXM-3.0-Tutorial-collect.

blchoy commented 4 years ago

I didn't hear any objection with my multiple IWXXM version schematron file in my previous post.

My final solution will be as follow:

For each IWXXM version there will be (i) a iwxxm.sch which is supposed to be used with reports of that particular version with or without being encapsulated in COLLECT, and (ii) a iwxxm-multiversion.sch which is supposed to be used with reports of that particular version and all previous (official) versions with or without being encapsulated in COLLECT.

The reason for still keeping iwxxm.sch is because iwxxm-multiversion.sch will grow in size with each release and for those producers who only care about the reports they are preparing there is a chance for them to save some computing resources by using iwxxm.sch.

The above will be made available in the upcoming IWXXM version 3.1-dev.