wmo-im / wis2-topic-hierarchy

https://wmo-im.github.io/wis2-topic-hierarchy
Apache License 2.0
4 stars 4 forks source link

merge country and centre-id #58

Closed tomkralidis closed 10 months ago

tomkralidis commented 11 months ago

As discussed at TT-WISMD 2023-09 face-to-face, as well as W2AT 2023-09-18

Proposal for adapting WTH to merge and country and centre-id

Overview

WTH levels 4/5 describe country and centre-id as follows:

Issues with country

Proposal

Remove country and define the centre-id on reverse hostname notation (starting with TLD) into a single compound level. In other words, the citation authority based on the Internet domain name of the issuing centre.

Examples:

Benefits

Implications

Change management

tomkralidis commented 11 months ago

Additional notes following discussion at W2AT:

golfvert commented 11 months ago

Extract from WMO 386 :

Screenshot 2023-09-26 at 22 55 25

It looks like fr-meteofrance-toulouse or xx-eumetsat-darmstadt

There must be a list of already approved "organization". @efucile Are you aware of something ? Then for "the production centre" (above toulouse or darmstadt) we may want to give a different production centre name per NC/DCPC operated by the organization.

josusky commented 11 months ago

If there is a list of organizations / production centres, we could refer to it. Otherwise I would be careful directly referencing publication 386 as that one is going to be obsolete. When it concerns dots vs dashes, well, dots are more "natural", but sure we can get used to dashes.

efucile commented 11 months ago

I don't know of such a list. Referring to 386 is not under discussion as it will be retired. We could adapt the text and insert it in the WIS Guide. Are we sure we want to go for the 2-letter code? I think that we have already decided to use the 3-letter code.

golfvert commented 11 months ago

The idea is to reuse some of the wording of WMO 386 and not refer to WMO 386. And if we merge country and centre-id, then going to 2-letter code has some "logic" for the merge country-centre-id to look like a reverse DNS name.

efucile commented 11 months ago

The Reverse DNS name is familiar to many people. I would go for it and avoid - Is the new country-centre-id a unique identifier for a WIS2 node? Can we have two nodes with the same country-center-id?

golfvert commented 11 months ago

We want to be on the safe side and avoid "." in the topic hierarchy (it may create issues with some MQP protocols). And, as GTS people have been really wise in defining in the file name convention almost like reverse DNS we propose to stick to this. Then, as for the DNS, we can have multiple names in the same domain. So, with this kind of reverse notation, we could have something like (using Météo-France as an example): fr-meteofrance-vaac for our duty on Volcanic ashes fr-meteofrance-nwp for our NWP products ...

There are two benefits in merging country and centre-id.

josusky commented 11 months ago

There are two benefits in merging country and centre-id.

  • a minor one (for me) by removing one level in the topic hierarchy
  • an important one to be able to assess whether the topic is valid without any ambiguity. With country and centre-id at two levels, we didn't have that easily this feature.

I agree completely.

SimonElliottEUM commented 11 months ago

I support merging the country and centre-id by reusing the "location indicator" developed for the General File Naming Convention in 386. In addition to the good reasons given above, we can see that for many years this has proved to be an effective mechanism (especially once we sorted out the niggle with uppercase characters in 3166!). I don't believe that either the organization or the production centre com from a controlled list, but using the DCPC is a logical approach.

golfvert commented 11 months ago

The purpose of the controlled list is to limit/avoid the creativity in naming the centre-id :) and to get a consistent approach. See https://github.com/wmo-im/wis2-topic-hierarchy/issues/51 Typically, at the moment, we have no rule, and we have for example country_wis2node as centre-id. Something like fr-france_wis2node should be avoided.

sebvi commented 11 months ago

Do I understand correctly that if you go ahead and merge country and centre-id levels, it will require a bump of version to "b" ? it should be mentioned in the top level proposal.

tomkralidis commented 11 months ago

Good point. That is debatable given the current state/phase of the specifications.

golfvert commented 11 months ago

At the moment, being in the pilot phase, I would stick to "a" for the version of the topic hierarchy. Having some "agility" in the pilot phase is needed. Also considering that Global Caches are subscribing to origin/a/wis2/# it won't break anything there.

tomkralidis commented 11 months ago

TT-WISMD 2023-10-06:

Proposal being: xx-xxxxx

where:

josusky commented 11 months ago

Using reverse DNS notation has the benefit of guaranteeing uniqueness. But if a centre wants to use a slightly modified version of it, e.g.:

there is no reason to disagree. There will still be a registration process for centre IDs to ensure uniqueness. Most importantly, in my opinion, we should not force international organizations to use the prefix xx- or xxg- instead of the well-established and understandable int- or org-.

tomkralidis commented 11 months ago

TT W2AT 2023-10-16:

TT-WISMD will provide final proposal on 2023-10-23

6a6d74 commented 11 months ago

Thanks @tomkralidis ... and confirming the ET-W2AT position. We noted the need to standardise on either option (a) or (b) for Token 1 in your comment above, rather than allow a user to choose from those options.

I think the proposal we have is:

<token1>-<token2>

where:

<token1> = TLD, as defined by IANA here

and

<token2> = centre name - which may be a - (dash) separated set of labels

Commonly, this may resemble the reverse DNS notation of the WIS Centre. Also note that larger organisations may host multiple WIS2 nodes, each of which can be given an alternative designation.

Examples:

fr-meteo-vaac fr-meteo-nmc

etc.

More examples would be useful.

We need to define the permitted set of characters to use in <token2>, for example . is not allowed (as described above)

tomkralidis commented 11 months ago

So in implementation one would, say:

centre_id = 'fr-meteo-vaac'
print(centre_id.split('-', 1)) # split only on the FIRST occurrence of dash
['fr', 'meteo-vaac']
golfvert commented 11 months ago

@tomkralidis: In which context would there be a need to split that "string" ?

@6a6d74 In terms of examples, we have to address what can be called exception such as: Weather products from noaa.org (so by NWS). What is the centre-id ? org-noaa-nws-something (something being metar, synop, or...) org-noaa-nws (no "something" allowed) us-noaa-nws us-nws-something

What about organisation with a "gov" in their domain name ? My proposal (same as Jan), just drop it. So uk-metoffice-something ?

If we have some examples with the odd cases, then, we are good (that is consistent and unambiguous). Agreed ?

tomkralidis commented 11 months ago

@golfvert : let's say one wanted to enumerate or group all active topics based on TLD. This is an unvalidated use case at the moment, but it will likely surface.

Regarding the NOAA use case, I would say that NOAA chooses what that is, as long as its based on a TLD.

golfvert commented 11 months ago

OK. So, you want all to identify all French products ? Maybe...

Anna mentioned that having something consistent would be good. So, keeping "gov" or not (which can be "gob" in other languages) should be clear a recommendation (a MUST ?) to be consistent. Same for USA Centres. Shall we "move" them to us- ? Or each one will choose ?

SimonElliottEUM commented 11 months ago

Using reverse DNS notation has the benefit of guaranteeing uniqueness. But if a centre wants to use a slightly modified version of it, e.g.:

  • us-noaa-... instead of gov-noaa-...; or
  • uk-metoffice-... instead of uk-gov-metoffice-...

there is no reason to disagree. There will still be a registration process for centre IDs to ensure uniqueness. Most importantly, in my opinion, we should not force international organizations to use the prefix xx- or xxg- instead of the well-established and understandable int- or org-.

Not forcing international organizations into "xx" is a good idea. We have used "XX" from the General File Naming Convention in 386 for a long time (XX-EUMETSAT-Darmstadt) and "xxf" was not really an improvement. "int" would be much better.

6a6d74 commented 11 months ago

@tomkralidis, @golfvert - can we conclude the discussion on this and update the proposal outlined at the top of this thread? I think the only outstanding issue is a recommendation on the use of "gov" (or "gob").

tomkralidis commented 11 months ago

TT-WISMD 2023-10-23:

Proposal:

golfvert commented 11 months ago

And, I think :

tomkralidis commented 11 months ago

That is actually an expanded token 2 (which itself can include dashes). Regardless, will add text to the specification to articulate SHALL/SHOULD/MAY.

tomkralidis commented 11 months ago

PR in #64

kaiwirt commented 10 months ago

The examples in the topic hierarchy document use the iso3c country code (deu-dwd) instead of the TLD (de-dwd)

tomkralidis commented 10 months ago

Good catch @kaiwirt; fixed in PR #58.

tomkralidis commented 10 months ago

58 merged.