inspire-eu-rdf / inspire-rdf-guidelines

INSPIRE data in RDF
http://inspire-eu-rdf.github.io/inspire-rdf-guidelines/
13 stars 4 forks source link

Address: Names can be simplified #35

Closed DieterDePaepe closed 7 years ago

DieterDePaepe commented 7 years ago

I believe the address ontology to be overly complex in regards to the names. As an example:

1) Starting from an ad:Address, we follow ad:Address.component to find a ad:ThoroughfareName. 2) From this, we follow ad:ThoroughfareName.name to get a ad:ThoroughfareNameValue. 3) From this, we follow ad:ThoroughfareNameValue.name to get a ad:GeographicalName. 4) This GeographicalName can, according to the specs, be a rdfs:Literal or a complex type. 5) In case a complex type is used, the skos:prefLabel contains the value.

I believe 3, 4 and 5 can be merged together without loss of expressiveness.

ex:Name1 a ad:ThoroughfareName;
  ad:ThoroughfareName.transportLink ex:Link1, exLink2;
  ad:ThoroughfareName.name ex:NameValue1.

ex:NameValue1 a ad:ThoroughfareNameValue;
  ad:ThoroughfareName.namePart ex:Part1, ex:Part2, ex:Part3; // from ad:ThoroughfareNameValue.nameParts
  skos:prefLabel "rue de la Paix"@fr; // from GeographicalName
  skos:altLabel "rue dl Paix"@fr.

ex:Part1 a ad:PartOfName; // Could also be done using properties on 
  ad:PartOfName.part "rue";
  ad:PartOfName.type <http://inspire.ec.europa.eu/codelist/PartTypeValue/type>

Note that 2 must remain separate. This is because the names (NameValues) might consist of different parts in different languages:

jechterhoff commented 7 years ago

This is related to #28 "Encoding of geographical names".

Summary first:

Now the TL;DR:

Regarding step 4: The guidelines state that for a given INSPIRE application schema, properties with type GeographicalName can be represented as a property with rdfs:range either being an rdfs:Literal (if it is known that no RDF applications need the complex information that the conceptual model of GeographicalName supports) or a complex type. At the moment it is not both. However, the use of rdfs:label to convey names would probably support that.

In the draft ontology of the Address application schema, the range of ThoroughfareNameValue.name is the complex type gn:GeographicalName ('ad:GeographicalName' was a bug in the draft ontology). The assumption is that complex information for a GeographicalName, like nameStatus and pronunciation, can be relevant as well.

The guidelines may be misleading regarding the use of skos:prefLabel and skos:altLabel when representing geographical names. A gn:GeographicalName is usually bound to a particular language (see the definition and description of GeographicalName.language in the conceptual schema). skos:prefLabel and skos:altLabel, when used as properties of a gn:GeographicalName, should probably be given in the language stated by that geographical name. However: skos:prefLabel and skos:altLabel can be used on any resource to label it, since the domain of these properties is undefined. The use of pref- and altLabel in ex:NameValue1 in your example would thus be allowed. If the INSPIRE type contained multiple properties with type GeographicalName (like AddressRepresentation), then mapping these properties to RDFS labels would be ambiguous (unless the properties were still represented, but as subPropertyOf rdfs:label). An RDF application that supports ThoroughfareNameValue.name with range gn:GeographicalName would look at the gn:GeographicalName values of ThoroughfareNameValue.name to determine the information it needs. More likely, it will filter the ThoroughfareNameValue resources that are linked by ThoroughfareName.name to identify the relevant ones (e.g. only use geographical names with nameStatus 'official' or 'standardized' [which skos:prefLabel and skos:altLabel would not support], and in a specific language).

Basically, what your example suggests is that a property with type GeographicalName can be represented by rdfs:labels (including skos:prefLabel and skos:altLabel). This is similar to a suggestion from the SmartOpenData project - see https://www.w3.org/2015/03/inspire/ (chapter "The GCM & Geographical Names"). The draft INSPIRE RDF guidelines currently leave a choice: encode properties with type GeographicalName in a particular application schema either with range rdfs:Literal (keeping the semantics of the property) or a complex type (gn:GeographicalName). In your example, if the simple encoding was used, that would mean that ex:NameValue1 would have ad:ThoroughfareName.name "rue de la Paix"@fr. In that case, ad:ThoroughfareName.name could be defined as a sub property of rdfs:label.

DieterDePaepe commented 7 years ago

Not sure we're talking about the same thing here (in particular, I didn't quite get your last paragraph).

The point I'm making is that I see no need for both ad:ThoroughfareNameValue and GeographicalName. In my example in the opening post, I dropped GeographicalName, but I could as well have dropped ThoroughfareNameValue instead. I'm not talking about the simple or complex representation of a GeographicalName.

My question: for what use cases is ThoroughfareNameValue needed if we were to change the following:

ad:ThoroughfareName.name rdfs:range gn:GeographicalName.
ad:ThoroughfareName.namePart rdfs:domain gn:GeographicalName.
jechterhoff commented 7 years ago

In your initial example you pointed out that step 2 would have to be kept as-is since the name values might consist of different parts in different languages. Therefore I assumed that you were ok with the structure of ThoroughfareNameValue and that the complexity of GeographicalName was the issue.

Your question can be applied to the INSPIRE conceptual model: Why was the nameParts property not modelled on the data type GeographicalName? Because in that case, ThoroughfareNameValue would indeed not have been needed.

I was not involved in the design of the INSPIRE Addresses schema, so I can only make an assumption: nameParts belongs to ThoroughfareNameValue because the model of that property (and its type PartOfName, which itself holds the name of a specific part, but, more importantly for this discussion, also the type of that part) specifically supports the subdivision of a thoroughfare name into parts. The PartTypeValue allows a data provider to define the type of each part of a thoroughfare name (type, name prefix, name, qualifier). However, because the nameParts is specific to thoroughfare names, I assume that that's the reason why nameParts does not belong to the more general GeographicalName. Also keep in mind that a GeographicalName is language specific, and so would be the subdivision into parts, meaning that a ThoroughfareNameValue provides a language specific name and optionally its subdivision into parts.

As to the use cases that require the name parts of a thoroughfare name, I could only guess (maybe improved searching).

Back to the RDF representation of INSPIRE data: If RDF applications will never need the name parts provided by a ThoroughfareNameValue, then we could change the type of ThoroughfareName.name to GeographicalName.

Unfortunately, ThoroughfareNameValue is not modelled as a subtype of GeographicalName. This would avoid at least the indirection introduced by the current ThoroughfareNameValue.name.

DieterDePaepe commented 7 years ago

However, because the nameParts is specific to thoroughfare names, I assume that that's the reason why nameParts does not belong to the more general GeographicalName.

If that is the only issue, you could let ThoroughfareNameValue be a subtype of GeographicalName.

Also keep in mind that a GeographicalName is language specific, and so would be the subdivision into parts, meaning that a ThoroughfareNameValue provides a language specific name and optionally its subdivision into parts.

I know, just like a GeographicalName is also intented to keep different languages apart. So again, overlap between those 2.

Back to the RDF representation of INSPIRE data: If RDF applications will never need the name parts provided by a ThoroughfareNameValue, then we could change the type of ThoroughfareName.name to GeographicalName.

That's an option, but I don't think you can be sure that an application will never need it. Isn't this RDF model supposed to be a valid implementation of the INSPIRE specification, which demands that it is possible to represent? (Honest question, I assume it is.)

Unfortunately, ThoroughfareNameValue is not modelled as a subtype of GeographicalName. This would avoid at least the indirection introduced by the current ThoroughfareNameValue.name.

What prevents you from doing so? If this RDF model has to be a 1:1 conversion of the conceptual INSPIRE model, without trying to take advantage of the modelling features of RDF, there is no point me spending time here to suggest improvements. ;)

jechterhoff commented 7 years ago

Isn't this RDF model supposed to be a valid implementation of the INSPIRE specification, which demands that it is possible to represent? (Honest question, I assume it is.)

Of course the idea is to create guidelines that support the RDF representation of INSPIRE schemas.

Unfortunately, ThoroughfareNameValue is not modelled as a subtype of GeographicalName. This would avoid at least the indirection introduced by the current ThoroughfareNameValue.name.

What prevents you from doing so? If this RDF model has to be a 1:1 conversion of the conceptual INSPIRE model, without trying to take advantage of the modelling features of RDF, there is no point me spending time here to suggest improvements. ;)

Nobody said that this needs to be a 1:1 conversion. As a matter of fact, the draft ontologies are already different (for example, multiplicity and the stereotype <<voidable>> are not converted). I was just describing the current situation in the conceptual model.

We do want to create guidelines that define a useful RDF encoding of INSPIRE data. Therefore, your input is much appreciated. These discussions help us identify conversion patterns.

So, I think we reached a conclusion here: Make ThoroughfareNameValue a subClassOf GeographicalName and omit ThoroughfareNameValue.name.

jechterhoff commented 7 years ago

The result of this discussion has been implemented in the revision of the vocabulary for the INSPIRE Addresses schema.

This issue can be re-opened in the future, if necessary.