karmaresearch / vlog

Apache License 2.0
55 stars 9 forks source link

Language-tagged strings are confused with xsd:Strings #54

Closed mkroetzsch closed 4 years ago

mkroetzsch commented 4 years ago

VLog internally seems to confuse language-tagged strings (http://www.w3.org/1999/02/22-rdf-syntax-ns#langString) with plain RDF strings (xsd:string). For example, this can be seen when using SPARQL query sources. Here are three SPARQL query patterns to illustrate the issue (to be executed against https://query.wikidata.org/:

?subject wdt:P31 wd:Q39715 ; rdfs:label ?label ?subject wdt:P31 wd:Q39715 ; rdfs:label "Leuchtturm Moritzburg" ?subject wdt:P31 wd:Q39715 ; rdfs:label "Leuchtturm Moritzburg"@de

The first query has a result where ?subject is ``http://www.wikidata.org/entity/Q1821440and object is"Leuchtturm Moritzburg"@de```. Accordingly, the second query has no results, the third query has exactly one.

However, a query for facts imported from this source returns the pair <http://www.wikidata.org/entity/Q1821440>,"Leuchtturm Moritzburg" instead. Nevertheless, a query for pairs of the form ?X, "Leuchtturm Moritzburg" returns no results (since VLog runes query number 2 above in this case). If VLog is forced to import the query results for further reasoning (by adding more rules), then the query for ?X, "Leuchtturm Moritzburg" suddenly has a result. This seems to show that the language-tagged string "Leuchtturm Moritzburg"@de was transformed into the xsd:String "Leuchtturm Moritzburg" on import.

It is possible that this problem also occurs when loading data from RDF files (e.g., it could lead to wrong joins), but I have not tested this. The problem is of some importance since language-tagged labels are quite important in RDF.