wmo-im / iwxxm-codelists

Code list management for WMO published content
0 stars 6 forks source link

Versioning of IWXXM Code Lists #11

Open blchoy opened 3 years ago

blchoy commented 3 years ago

The issue on versioning keeps coming back again and again. I can see we may enter into the same paradox we have with versioning of IWXXM, as the tables will be used individually but during a FT update not all of them will be changed. I am opening up this discussion to see if we would like to adopt the same versioning policy of IWXXM or continue the existing practice as other TDCF.

Just to remind all that the WMO Codes Registry should also be able to record the version number of code tables, be it advanced individually for each table or collectively according to a FT.

mgoberfield commented 3 years ago

Hi @blchoy, You don't mention this (and I don't know existing practice as other TDCFs) but concepts in these containers can (and do) have historical information. So if a particular concept's definition changes, the previous definitions are preserved. If a data producer wants to use the older definition of a concept, they can still refer to it via a slightly different URL -- that's the price they'll have to pay, but it can be done. For climatologists and researchers, faced with URLs that don't specify which version, they can consult the issuance time of the product to determine which concept definition to use.

mgoberfield commented 3 years ago

image

mgoberfield commented 3 years ago

While the history/archival feature of the Code Registry is essential, I didn't use the "valid from/valid to" time snapshots that Epimorphics provides (which is the timestamp when the particular entry was added to its internal database). Instead, for the NWS Code Registry--which uses the same software--I added a new field, "valid", in the 'Definition' Table to indicate when the Concept was, well, valid. Unfortunately, the NWS Code Registry is off-line at the moment so I cannot demonstrate/show it to you.

But this is how it looks in a CSV file for instance, the valid parameter/values are at the end of each line and is in the "dct" namespace:

@id,@notation,@status,dct:description,rdf:type,rdfs:label,reg:notation,dct:valid
<https://codes.nws.noaa.gov/NWSI-10-813/AmendableTAFParameter/CEILING>,CEILING,stable,Amendments based on overcast and/or broken cloud layers may be issued,skos:Concept,Ceiling,CIG,2007-11-19/-
<https://codes.nws.noaa.gov/NWSI-10-813/AmendableTAFParameter/NONE>,NONE,stable,No TAF element will be amended.,skos:Concept,None,NONE,2007-11-19/-
<https://codes.nws.noaa.gov/NWSI-10-813/AmendableTAFParameter/VISIBILITY>,VISIBILITY,stable,Amendments based on prevailing horizontal visibility may be issued,skos:Concept,Visibility,VIS,2007-11-19/-
<https://codes.nws.noaa.gov/NWSI-10-813/AmendableTAFParameter/WEATHER>,WEATHER,stable,Amendments based on precipitation and/or obstruction-to-vision may be issued,skos:Concept,Weather,WX,2007-11-19/-
<https://codes.nws.noaa.gov/NWSI-10-813/AmendableTAFParameter/WIND>,WIND,stable,Amendments based on changes to near surface wind vector may be issued,skos:Concept,Wind,WIND,2007-11-19/-

I realize that this may be a little confusing, given there are two different valid times, but I think people can figure it out. 😄 Does that solve the problem of versioning for code lists? Or have I, as usual, misunderstood the problem entirely?

blchoy commented 3 years ago

Instead, for the NWS Code Registry--which uses the same software--I added a new field, "valid", in the 'Definition' Table to indicate when the Concept

Thanks @mgoberfield and I think this is a very practical solution.

If a data producer wants to use the older definition of a concept, they can still refer to it via a slightly different URL -- that's the price they'll have to pay, but it can be done.

You are right and this is a potential missing link for those code list entries mentioned in an IWXXM instance. The following is extracted from metar-a3-1.xml:

<iwxxm:presentWeather xlink:href="http://codes.wmo.int/306/4678/DZ"/>

There is no indication from which version of the code table it was based on; you will need to check the creation date of the instance and assume the producer had used the right table version.

I am not saying this is a big deal, it could be me who think that this could be an issue. A slight change to the URL as you have mentioned could be a solution; in fact we are already using namespace declarations to indicate applicable versions of XML schema of individual XML elements:

<iwxxm:METAR xmlns:iwxxm="http://icao.int/iwxxm/3.0" ...>

I remember in a previous discussion, a version of IWXXM that complies to a certain set of requirements (including Annex 3 + PANS-MET amendments) should not only refer to the version of IWXXM being used, but the associated code tables. Now TT-AvData is going to take care of its code tables. Following our discussion of IWXXM versioning, may be we could also include the version of IWXXM code tables there too, in view of the possibility that the schemas have not changed but the code tables did?

IWXXM Version Compliance Reason for Release METAR/SPECI TAF SIGMET AIRMET Tropical Cyclone Advisory Volcanic Ash Advisory Space Weather Advisory WAFS SIGWX Forecast IWXXM Code Table
2032-2 Amendment 84 New Requirement 3.1 3.1 4.1 4.1 4.0 4.0 3.0 1.0 2032-2
2033-1 Amendment 84 Urgent bug fix 3.1 3.1 4.2 4.2 4.0 4.0 3.0 1.0 2032-2
2033-2 Amendment 84 Urgent Code Table fix 3.1 3.1 4.2 4.2 4.0 4.0 3.0 1.0 2033-2
mgoberfield commented 3 years ago

I thank you for your comment. It's late here @blchoy. I would like to explore this a bit more and work through a use case of a change in a concept's definition. The introduction of a new concept (or new table) isn't a problem (see SpaceWeather). And I don't think the retirement of a concept is a problem either (RunwayState). It should be clear that retired/obsolete concepts (or a table) are never removed from the code registry as historical/climatological archives of IWXXM messages will still refer to them.

mgoberfield commented 3 years ago

@blchoy you wrote:

You are right and this is a potential missing link for those code list entries mentioned in an IWXXM instance. The following is extracted from metar-a3-1.xml:

There is no indication from which version of the code table it was based on; you will need to check the creation date of the instance and assume the producer had used the right table version.

Yes and I did mention that in my original post. This is also true of TAC messages, it's not unique to XML.

An example of a concept's definition changing is 'SCT' because of the introduction of 'FEW' to the METAR/SPECI code forms. Researchers looking at archived METARs data will need to be aware that in the past 'SCT' was defined slightly differently than it is today and take that into account. As long as past concepts' definitions along with their valid times are available, that should be sufficient to accurately reconstruct the product's meaning at the time it was issued. The WMO Code Registry history information makes it easier to determine if there were any changes to the definitions -- instead of looking through old musty paper references or PDFs.

Yes. It is incumbent on the data producers to make sure everything is accurately represented and correct URLs are used. I think that's a reasonable expectation. If they want to stick with and use a prior definition of a concept, the URL in their product has to explicitly reference the older version.

So for FT releases, then, are these just tagged "snapshots-at-a-specific-time" of the schemas and Code Registry contents in RDF format?

blchoy commented 3 years ago

So for FT releases, then, are these just tagged "snapshots-at-a-specific-time" of the schemas and Code Registry contents in RDF format?

Recalling Mark H's decision of removing the RDFs from schemas.wmo.int, I think that makes sense as we can now see there is an increasing possibility that the code tables will advance but the schemas will not. But at the same time, there is also a need to retrieve RDF snapshots with clear indication which versions of schemas and code tables are valid for a certain period of time.

May be we want to:

  1. store the RDF snapshots separately which can be referenced in the table of my previous post? Or
  2. still on schemas.wmo.int but include all snapshots of RDFs valid for a certain version of IWXXM? That means we will need to add to schemas.wmo.int as soon as a version of code list has been published:
    - iwxxm/9.0/rule
               - /iwxxm.sch
               - /codelist-2036-1.zip
               - /codelist-2036-2.zip
               - /codelist-2037-1.zip
blchoy commented 3 years ago

An example of a concept's definition changing is 'SCT' because of the introduction of 'FEW' to the METAR/SPECI code forms. Researchers looking at archived METARs data will need to be aware that in the past 'SCT' was defined slightly differently than it is today and take that into account.

To remove possible ambiguity one may have we will need to put code table version information (i) individually in the URL or (ii) elsewhere global to the document (c.f. defining namespace prefix). Otherwise we will not be able to handle cases like "we issued a TAF before the applicable date of a new code table but we need to issue a correction/amendment after the change over". Apart from date and time of issuance we need a bit of intelligence to understand which code table is being referred to in such a report (though rare I have to say).

Glad if @marqh could shed some light on the possibility of adding new features (may be you have some better ideas) to the Codes Registry (ormay be you have some better ideas) to cater for the above?