wmo-im / wcmp2

WMO Core Metadata Profile 2
https://wmo-im.github.io/wcmp2
6 stars 3 forks source link

Clarify D in req/core/identifier #182

Open amilan17 opened 9 months ago

amilan17 commented 9 months ago

|D |The +id+ property shall include a local identifier as defined by the data publisher. The local identifier shall not have spaces or special or accented characters.

The question is what are "special" characters?

amilan17 commented 9 months ago

@tomkralidis

tomkralidis commented 9 months ago

Perhaps we can further qualify with:

cc @josusky

josusky commented 9 months ago

This is too restrictive. The regular expression that you have provided is correct only for the "namespace identifier" NID part. But the NID is fixed in our case to wmo. The rest of the URN is "Namespace Specific String" (NSS) and its validation is more benevolent. Original description is in https://www.rfc-editor.org/rfc/rfc2141.html (section 2.2) and is slightly modified (extended) by newer RFC (https://www.rfc-editor.org/rfc/rfc8141). Example of a valid URN is: urn:example:a123,z456?+abc

josusky commented 9 months ago

I am not deadly against a rule that is more strict than actual URN specification. I looked up the specification because I spotted the innocent dot (.) in Tom's list - that "lifted me off the chair" :-) I can hardly imagine anyone putting ~ or ] into metadata ID but a dot (.) or slash (/) seem quite OK to me.

tomkralidis commented 9 months ago

Having a slash (/) in the ID introduces URLs like the following in the GDC:

https://example.org/collections/foo/items/foo%2Fbar

While we can relax the regex set mentioned previously, the above would be error prone.

amilan17 commented 4 weeks ago

The definition as approved during PR #183. "The id property SHALL include a local identifier as defined by the data publisher. The local identifier SHALL NOT have spaces or accented characters."

tomkralidis commented 4 weeks ago

TT-WISMD 2024-10-22:

josusky commented 3 weeks ago

Specifying a character set that does not have accented characters and other things that can complicate the usage of this identifier is a good idea. IRA T.50 is an appropriate choice. Apart from that (and the space), did you discuss some more restrictions during TT-WISMD 2024-10-22?