wmo-im / iwxxm

XML schema and Schematron for aviation weather data exchange
https://old.wmo.int/wiswiki/tiki-index.php%3Fpage=TT-AvXML
48 stars 22 forks source link

Missing local copies of the WMO code lists (RDFs) #223

Closed ilkkarinne closed 3 years ago

ilkkarinne commented 4 years ago

The Schematron rules referring to local copies of the WMO code list fail as these copies are no longer included in the folder /IWXXM/rule/ since tag v3.0.0RC2

I found possibly relevant discussion in #182, but failed to find guidance on how these schematron rules are supposed to work without the RDFs they refer to.

Example: https://github.com/wmo-im/iwxxm/blob/v3.0.0/IWXXM/rule/iwxxm.sch#L676

mgoberfield commented 4 years ago

The RDF files are to be in the same directory as the schematron file. As part of the CI process, this site uses the bin/codeListsToSchematron.py script to download the RDF files.

ilkkarinne commented 4 years ago

Thanks for the information, and the scripts you sent me by email @mgoberfield. I'm ok with keeping this issue closed if you feel like it, but would like to record the discussion here for the other interested parties.

From your comment above @mgoberfield I take it that there is a an active CI process that builds the IWXXM final packages? So neither the Github repository tags nor the schemas available at https://schemas.wmo.int/iwxxm/ are in fact the final IWXXM release packages? If this is the case, could you please add a short note about the CI and the release processes on the main Github README.md file? I'm sure I'm not the only one assuming that the full final IWXXM 3.0.0 release content, including the schemas as well as fully functioning schematron rules would be available at https://github.com/wmo-im/iwxxm/tree/v3.0.0

Having snapshot copies of the WMO code lists values at the time of the IWXXM release versions stored within the IWXXM release packages would be highly useful for unambiguous business rule validation between different pieces of IWXXM processing software: I would guess that most of the IWXXM processing applications must rely on offline validation only due to performance and security restrictions of their operating environments. For interoperability I feel that it would be important to have stable and authoritative code list snapshots matching the release snapshots of the schematron rules. If the code list values are updated at some point in time, having the snapshots would at least keep the list values and the rules in sync until the next IWXXM release.

If the CI is used for building these code list snapshots, and storing them with the schematrons, I guess it would be trivial to use the CI to build the final, self-contained release packages too. Depending on your CI service you may also be able to automate publishing these packages as Github Releases of this repo, it seems that at least Travis can do this.

The FMI/KNMI IWXXM processing library has been using the IWXXM 2.1.1 schematron rules as well as the schemas as part of the IWXXM validation process operationally, and we would warmly welcome fully self-contained, authoritative schematron rule set also for IWXXM 3.0 to ensure compatibility with the specification.

blchoy commented 4 years ago

Having snapshot copies of the WMO code lists values at the time of the IWXXM release versions stored within the IWXXM release packages would be highly useful for unambiguous business rule validation between different pieces of IWXXM processing software...

We did this in the past because a change of specifications (for example triggered by the publishing of a new amendment to ICAO Annex 3) would induce corresponding changes in both the schema and the Codes Registry (i.e. the RDF files). But as the schemas are becoming more stable, a situation may happen where only the Codes Registry has changed but not the schemas. In this case do you want us to issue a new version of IWXXM solely because the RDF files have changed? Or you want us to replace the RDF files in schemas.wmo.int without changing the version? The team considered that both are not desirable and the latter one in particular could cause confusion. That's why we decided to separate the RDF files from the schemas.

I admit that we need to provide more information on this approach, and we are planning to do this in the upcoming Amendment 79 to ICAO Annex 3. Views are most welcomed.

ilkkarinne commented 4 years ago

Thanks for explaining the current situation @blchoy.

I guess what would make sense to me in a situations when the Codes Registry changes depends on the change. If the change is such that it directly affects the way IWXXM messages can or cannot be crafted, then IMHO the change should be brought into IWXXM as a version change, including the updated guidance, schematron rules, and the updated, relevant codel list values, even if no changes to the XML schemas would be required. Note that this kind of release does not have to be done immediately when the Codes Registry change is published. Thus the offline RDF files, the schematron rules and the IWXXM guidance may be out-of-sync with the online Codes Registry at a particular point in time, and indeed would most certainly be so for the historical IWXXM release versions. However, as self-contained packages they would always in-sync internally, and enable software vendors to support any of the IWXXM versions as it was at the time of their release, including the business rule validation.

For any operational system the validation needs to be done offline anyway, and thus if the schematron rules are expected to be followed, each vendor would have to create their own local copies of these rules most likely include those copies in their software distributions. If there are no authoritative copies available, these copies will differ from vendor to vendor depending on the time when they were copied from the Codes Registry, resulting to varying validation results between vendors even though the stated supported IWXXM versions would be identical.

The Codes Registry will also probably not be versioned in a way that would allow extracting the values of a particular code list at a given point in time (such as the release date of a particular IWXXM version), and thus it will be very difficult to know afterwards which code list values were the authoritative ones at the release time unless these values are included in the IWXXM release packages. I don't really see good options to validate old (achieved) IWXXM messages against the validation rules in force at the time of their creation would unless those rules are being persisted.

To me the issue is comparable to inter-dependencies between software libraries: library A does not have to start using a new release version of the library B it depends on immediately, but can instead make the necessary changes in it's own pace and upgrade the B version only when ready. In order to make this work the each library B release version needs to be available an immutable package.

Note that only the Codes Registry code lists actually used in IWXXM messages need to be taken down as offline copies. They would never be the original resource for this information, but a persistent snapshot of this content.

mgoberfield commented 4 years ago

Actually the WMO Code Registry does have historical versioning of the Collections and Concepts. However, I've never attempted to create a snapshot. Maybe Mark H. (@marqh) knows how to do this.

blchoy commented 4 years ago

Actually the WMO Code Registry does have historical versioning of the Collections and Concepts.

That only records when the entity has been updated. It has no reference to the version number of the code tables it represents.

blchoy commented 3 years ago

Closed as this will be followed up in https://github.com/wmo-im/iwxxm-codelists/issues/11