inspire-eu-rdf / inspire-rdf-guidelines

INSPIRE data in RDF

http://inspire-eu-rdf.github.io/inspire-rdf-guidelines/

13 stars 4 forks source link

Encoding of geographical names #28

Open jechterhoff opened 7 years ago

jechterhoff commented 7 years ago

Description

The INSPIRE data type "GeographicalName" supports the provision of multiple spellings for a name, a link to an audio file for pronunciation, and more. Applications often simply just need the name in one spelling, potentially with indication of the language. In such cases, "GeographicalName" can be mapped to a simple rdfs:Literal.

On the other hand, if complex information is available for a geographical name, instead of just a simple label, then encoding the name as an individual resource can be useful:

The SKOS properties prefLabel and altLabel can be used to provide labels for the name in multiple languages, while also distinguishing preferred from alternative labels.
Comparison of geographical names: if two spatial objects have "name" predicates with the same URI, then both spatial objects have the same geographic name. Resource equality can be asserted through a simple comparison of resource identifiers (the URIs). This appears to be a use case in the hydrology domain (see definition and description of property "geographicalName" in the spatial object type "HydroObject").

Discussion Item

Should a property with geographical name as value type be encoded with a literal or with a class as range? Are there suitable alternatives?

jechterhoff commented 7 years ago

One solution could be to use a class as range for properties with geographic name as value type. That would serve the case where complex information is available.

For simple cases, when encoding instance data, and where a geographical name shall directly (as a literal) provide a label for an individual resource, that label could be encoded using rdfs:label or locn:geographicName.

So essentially, we could support both cases.

DieterDePaepe commented 7 years ago

There are indeed 2 valid options: using a class as value for locn:geographicName or using a literal value. You could also define the property to allow both, but this would cause problems for tools, so I'd avoid that.

Note that the current spec seems to suggest another option (taken from locn):

"For INSPIRE-conformant data, provide the metadata for the geographic name using a skos:Concept as a datatype."

I hope they just had a poor choice of words in this, since the datatype in RDF terms is for example "xsd:date". You could use a skos:Concept as datatype, and add extra information to that. But that would be a horrible way to add data to a Literal.

So, considering the two options: I suggest sticking to Literals, since most kinds of "complicated" data will be in fact information about the named entity.

For example:

:MountFuji skos:prefLabel "Mount Fuji"@en; skos:prefLabel "Fuji-san"@jp; skos:altLabel "Mount Fuji"@jp; x:pronouncedAs http://example.com/audio/123.mp3.

rather than

:MountFuji :geographicName :FujiName. :FujiName skos:prefLabel "Mount Fuji"@en; // Debatable whether this is correct. The correct label would be "the geographic name of Mount Fuji". skos:prefLabel "Fuji-san"@jp; skos:altLabel "Mount Fuji"@jp; x:pronouncedAs http://example.com/audio/123.mp3. // Could work

jechterhoff commented 7 years ago

Thanks for your feedback. Our current approach is as follows:

If it's known that for a specific INSPIRE application schema GeographicalNames in actual data are always simple (i.e. no additional link to audio files etc.) or that just the simple name is needed, then the UML type "GeographicalName" can be mapped to rdfs:Literal (for further details, see here). In other words, the range of a property with value type GeographicalName would be rdfs:Literal. If the indication of language is needed, rdf:langString can be used instead of rdfs:Literal, since rdf:langString is a subclass of rdfs:Literal.
If, on the other hand, encoding of all the properties of a GeographicalName is desired, then we would use an according OWL class (that is aligned with skos:Concept) as range. The current draft RDF encoding of the INSPIRE "Geographical Names" application schema, including the class GeographicalName, is available here. A property from another INSPIRE application schema that has value type GeographicalName would then use that OWL class as range. In addition, the property would be aligned with locn:geographicName to facilitate interoperability and searches (for further details, see here).
It is always possible to add further labels to a feature encoded in RDF. The difference to using one of the two ways to encode a property with value type GeographicalName is that they both allow keeping the semantics of these properties (i.e. the documentation of the property contained in the UML model).

DieterDePaepe commented 7 years ago

In my experience, letting a property accept both literals and classes as it's range, will pose problems with tools using it. I believe reasoners require the ObjectProperty/DatatypeProperty distinction in order to work.

Looking at the link in your second point does makes some things clear. The concept isn't being used as a datatype (a specific term in RDF), but as range. That does make me wonder what the point is of making it subclass skos:Concept. Concepts are typically used as codelists to be reused. I don't see any point in having GeometricName be a concept.

jechterhoff commented 7 years ago

Indeed, a mix of literal and class as range of a property is not desirable. The intent described in the first point of my previous comment is that someone who converts a given INSPIRE application schema into an ontology has a choice: If GeographicalNames in actual data for that application schema are always simple or if just the simple name is needed, then the UML type "GeographicalName" can be mapped to rdfs:Literal, meaning that all properties in the application schema with value type GeographicalName would have rdfs:Literal has range. Otherwise, use a class (e.g. the one created by encoding the "Geographical Names" application schema) as range of such properties.

The intent for aligning class gn:GeographicName with skos:Concept is to make it compatible with the ISA Programme Location Core Vocabulary (LOCN), more specifically the property locn:geographicName (see #29). The definition of a SKOS concept (a "unit of thought") is quite broad, even when considering the actual SKOS ontology. A geographical name is applied to a place or spatial object by some community, different communities can assign different names, and a community might even assign multiple names, to be used in different contexts. Geographical names can be grouped in SKOS schemes and collections - whatever a community needs. So when SKOS is used for structuring named terms (codelists are one example), I don't see any harm in assuming that that can be done for geographical names as well, especially under the open world assumption.