innoq / iqvoc

iQvoc - A SKOS(-XL) Vocabulary Management System for the Semantic Web
http://iqvoc.net/
Other
117 stars 44 forks source link

SKOS importer for SKOS-XL: auto-publishing issue #347

Open mgbeyer opened 9 years ago

mgbeyer commented 9 years ago

Suppose you have a scenario depicted by the N-Triples example below:

<http://lod.gesis.org/thesoz/concept_10099999> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/concept_10099999> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/concept_10099999> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099999_de> .
<http://lod.gesis.org/thesoz/concept_10099999> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099999_en> .
<http://lod.gesis.org/thesoz/concept_10099999> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099999_fr> .
<http://lod.gesis.org/thesoz/term_10099999_de> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099999_de> <http://www.w3.org/2008/05/skos-xl#literalForm> "hallo"@de .
<http://lod.gesis.org/thesoz/term_10099999_en> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099999_en> <http://www.w3.org/2008/05/skos-xl#literalForm> "hello"@en .
<http://lod.gesis.org/thesoz/term_10099999_fr> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099999_fr> <http://www.w3.org/2008/05/skos-xl#literalForm> "bla ble blu"@fr .
<http://lod.gesis.org/thesoz/concept_10099998> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/concept_10099998> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/concept_10099998> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099998_de> .
<http://lod.gesis.org/thesoz/concept_10099998> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099998_en> .
<http://lod.gesis.org/thesoz/concept_10099998> <http://www.w3.org/2008/05/skos-xl#prefLabel> <http://lod.gesis.org/thesoz/term_10099998_fr> .
<http://lod.gesis.org/thesoz/term_10099998_de> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099998_de> <http://www.w3.org/2008/05/skos-xl#literalForm> "dummy"@de .
<http://lod.gesis.org/thesoz/term_10099998_en> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099998_en> <http://www.w3.org/2008/05/skos-xl#literalForm> "dummy"@en .
<http://lod.gesis.org/thesoz/term_10099998_fr> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<http://lod.gesis.org/thesoz/term_10099998_fr> <http://www.w3.org/2008/05/skos-xl#literalForm> "bla ble blu"@fr .

So there are two different concepts and each one references multiple skos-xl:Label/skos-xl:literalForm instances via skos-xl#prefLabel, representing different language versions of a term. If you take a look at the corresponding XML the hierarchy becomes more obvious:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cc="http://creativecommons.org/ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:void="http://rdfs.org/ns/void#" xmlns:skosxl="http://www.w3.org/2008/05/skos-xl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:thesoz="http://lod.gesis.org/thesoz/ext/" xmlns:prv="http://purl.org/net/provenance/ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xml:base="http://lod.gesis.org/thesoz/">
   <rdf:Description rdf:about="concept_10099999">
      <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
   </rdf:Description>
   <rdf:Description rdf:about="concept_10099999">
      <skos:inScheme rdf:resource="http://lod.gesis.org/thesoz/"/>
   </rdf:Description>
   <rdf:Description rdf:about="concept_10099999">
      <skosxl:prefLabel rdf:resource="term_10099999_de"/>
      <skosxl:prefLabel rdf:resource="term_10099999_en"/>
      <skosxl:prefLabel rdf:resource="term_10099999_fr"/>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099999_de">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="de">hallo</skosxl:literalForm>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099999_en">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="en">hello</skosxl:literalForm>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099999_fr">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="fr">bla ble blu</skosxl:literalForm>
   </rdf:Description>
   <rdf:Description rdf:about="concept_10099998">
      <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
   </rdf:Description>
   <rdf:Description rdf:about="concept_10099998">
      <skos:inScheme rdf:resource="http://lod.gesis.org/thesoz/"/>
   </rdf:Description>
   <rdf:Description rdf:about="concept_10099998">
      <skosxl:prefLabel rdf:resource="term_10099998_de"/>
      <skosxl:prefLabel rdf:resource="term_10099998_en"/>
      <skosxl:prefLabel rdf:resource="term_10099998_fr"/>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099998_de">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="de">dummy</skosxl:literalForm>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099998_en">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="en">dummy</skosxl:literalForm>
   </rdf:Description>
   <rdf:Description rdf:about="term_10099998_fr">
      <rdf:type rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/>
      <skosxl:literalForm xml:lang="fr">bla ble blu</skosxl:literalForm>
   </rdf:Description>
</rdf:RDF>

Now there are two different literalForm N-Triples (with different origins) including the same value for the same language (which totally can happen in real life):

<http://lod.gesis.org/thesoz/term_10099998_fr> <http://www.w3.org/2008/05/skos-xl#literalForm> "bla ble blu"@fr .
<http://lod.gesis.org/thesoz/term_10099999_fr> <http://www.w3.org/2008/05/skos-xl#literalForm> "bla ble blu"@fr .

The problem: The importer won't auto-publish the duplicate and says something like "Publishing failed, subject xyz invalid, value has already been taken". This is actively taken account of in the validation part of the corresponding model in form of a uniqueness restriction (see: app\models\label\skosxl\validations.rb around line # 13 validates :value, uniqueness: { scope: [:language, :rev] }, if: :validatable_for_publishing?

I don't quite get why this validation is deliberately happening here? See, I'm no expert when it comes to the whole SKOS-XL format, so maybe I just don't know better. I'm aware that W3C says "No two concepts in the same concept scheme may have the same value for skos:prefLabel in a given language". But here we have two totally different (also origin-wise) non-core label definitions (which just happen to contain the same value in the same language). So our two concepts do NOT reference the same label-instance but two totally DIFFERENT label definitions each (afaik different labels can have the same literal form). Is this in any way a bad thing and not W3C/format-conform? The whole thing is imported just fine. But why the duplicates are excluded from auto-publishing then?