Closed KonradHoeffner closed 5 years ago
This happens for labels as well, see for example #240.
It does not happen with subjects:
select *
{
?x ?y ?z.
filter((!strStarts(str(?x),"http")) AND (!strStarts(str(?x),"nodeID"))).
}
However it happens very often with objects in the he graph and two times in the meta graph (private graphs not shown, ITIL is not affected):
select ?g count(*)
{
graph ?g {?x ?y ?z.}
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
}
g | callret-1 |
---|---|
http://www.snik.eu/ontology/meta | 2 |
http://www.snik.eu/ontology/he | 1833 |
Affects the following properties in he:
select ?y count(?z) as ?incorrect count(?a) as ?total from sniko:he
{
{
?x ?y ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
}
UNION
{
?x ?y ?a.
}
} group by ?y having(count(?z)>0)
These are too many errors to have been caused by manual OntoWiki edits, there must have gone something wrong during the extraction. Strangely though, only part of it is affected, so maybe it has to do with the consolidated vs. the non-consolidated part.
At least the non-labels are easy to fix:
dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0801he', 1000000000);
select ?x ?y ?z from sniko:he
{
?x ?y ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition).
}
select ?x ?y replace(str(?z),"he:","http://www.snik.eu/ontology/he/") from sniko:he
{
?x ?y ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition).
}
with <http://www.snik.eu/ontology/he>
delete {?x ?y ?z} insert {?x ?y ?fixed.} where { ?x ?y ?z. filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition). bind(IRI(replace(str(?z),"he:","http://www.snik.eu/ontology/he/")) as ?fixed) }
This deleted and inserted 860 triples and http://www.snik.eu/ontology/he/ChiefInformationOfficer looks good.
The non-literal-range-properties are also not shown anymore by the error detecting query:
y | incorrect | total
-- | -- | --
http://www.w3.org/2004/02/skos/core#altLabel | 4 | 183
http://www.w3.org/2000/01/rdf-schema#label | 102 | 2903
http://www.w3.org/2000/01/rdf-schema#subClassOf | 1 | 1488
http://www.w3.org/2000/01/rdf-schema#comment | 2 | 4
http://www.w3.org/2004/02/skos/core#definition | 865 | 1322
Now for the 865 broken definitions:
dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0802he', 1000000000);
select * from sniko:he
{
?x skos:definition ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
}
Should all be German, verified by looking at all definitions that contain "the":
select * from sniko:he
{
?x skos:definition ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
filter(contains(str(?z),"the"))
}
At this point one can also at the extraction table of Heinrich look: https://github.com/imise/snik-csv2rdf
In the main template https://github.com/IMISE/snik-csv2rdf/blob/a167c5d8294192bf259705f187d3564111f3b4ca/main.tarql.template there is:
BIND (STRLANG(?Definition,"de") AS ?d)
, so they should really all be labels and also always be German.
Fix the triples and test:
select ?x ?fixed from sniko:he
{
?x skos:definition ?z.
FILTER(isIRI(?z)).
BIND(STRLANG(?z,"de") AS ?fixed)
}
Looks good.
Apply the fix and check:
with <http://www.snik.eu/ontology/he>
delete {?x skos:definition ?z.}
insert {?x skos:definition ?fixed.}
where
{
?x skos:definition ?z.
FILTER(isIRI(?z)).
BIND(STRLANG(?z,"de") AS ?fixed)
}
We now have:
y | incorrect | total |
---|---|---|
http://www.w3.org/2004/02/skos/core#altLabel | 4 | 183 |
http://www.w3.org/2000/01/rdf-schema#label | 102 | 2903 |
http://www.w3.org/2000/01/rdf-schema#subClassOf | 1 | 1488 |
http://www.w3.org/2000/01/rdf-schema#comment | 2 | 4 |
Fix subClassOf via:
delete data from sniko:he
{<http://www.snik.eu/ontology/he/BenutzerforschungDurchfuehren> rdfs:subClassOf <meta:Function>}
insert data into sniko:he
{<http://www.snik.eu/ontology/he/BenutzerforschungDurchfuehren> rdfs:subClassOf meta:Function}
Fix comments via:
delete data from sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment <The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden.'>
<http://www.snik.eu/ontology/he> rdfs:comment <Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl Dirk Stelzer Herrmann Sikora>
}
insert data into sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment "The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden'."@en.
<http://www.snik.eu/ontology/he> rdfs:comment "Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl, Dirk Stelzer, Herrmann Sikora."@de.
}
Does not work because of the accept but even with unicode escaping there is a problem, so you OntoWiki https://www.snik.eu/ontowiki/view/?r=http://www.snik.eu/ontology/he&m=http://www.snik.eu/ontology/he for that.
Works like this:
with sniko:he
delete
{
<http://www.snik.eu/ontology/he> rdfs:comment ?z.
}
where
{
<http://www.snik.eu/ontology/he> rdfs:comment ?z.
}
insert data into sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment "The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden'."@en.
<http://www.snik.eu/ontology/he> rdfs:comment "Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl, Dirk Stelzer, Herrmann Sikora."@de.
}
Fix skos:altLabel with:
dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0805he', 1000000000);
delete data from sniko:he
{
?x skos:altLabel ?y.
filter(isIRI(?y))
}
insert data into sniko:he
{
<http://www.snik.eu/ontology/he/Datenschutzgesetz> skos:altLabel "DSG (\u00d6CH)"@de.
<http://www.snik.eu/ontology/he/DigitalBusiness> skos:altLabel "Electronic Business"@en, "eBusiness"@en.
<http://www.snik.eu/ontology/he/Handelsgesetzbuch> skos:altLabel "HGB (D\u00d6)"@de.
<http://www.snik.eu/ontology/he/Produkthaftungsgesetz> skos:altLabel "PHG (D\u00d6)"@de.
}
New backup: dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0806he', 1000000000);
Now we just have the labels:
http://www.w3.org/2000/01/rdf-schema#label | 102 | 2903
select * from sniko:he
{
?x rdfs:label ?l.
filter(isIRI(?l)).
}
The problem is that multiple labels in different languages are joined in one URI, for example:
Subject | Label URI |
---|---|
http://www.snik.eu/ontology/he/COBITInformationCriterion | CobiT Qualitätsmerkmal für Information |
http://www.snik.eu/ontology/he/CollaborationTools | Collaborative Software Groupware |
http://www.snik.eu/ontology/he/DBMS | Database Management System Datenbankverwaltungssystem |
http://www.snik.eu/ontology/he/Dienstleistungsqualitaet | Service quaility Servicequalität |
http://www.snik.eu/ontology/he/Dienstleistungsqualitaet | Servicequalität Service Quality |
http://www.snik.eu/ontology/he/Durchlaufzeit | Cycle Time Lead Time |
http://www.snik.eu/ontology/he/EPK | ereignisgesteuerte Prozesskette Event-driven Process Chain |
http://www.snik.eu/ontology/he/EVA | Earned Value Analysis Earned-Value-Analyse |
http://www.snik.eu/ontology/he/Einsatzplan | Initiative Guide Plan of Action |
http://www.snik.eu/ontology/he/Entscheidungsregel | Aggregationsfunktion Decision Rule |
http://www.snik.eu/ontology/he/Entwicklungsrueckstau | Anwendungsrückstau Application Backlog |
This seems hard to solve automatically, there are two options: (1) restore it from the extraction tables and (2) do it manually for the 102 labels.
First, look at the extraction table https://github.com/IMISE/snik-csv2rdf/releases/download/0.2.0/he-main.csv.
As an example, lets use he:COBITInformationCriterion:
SubjektUri | SubjektDe | SubjektEn | SubjektAltDe | SubjektAltEn | ... |
---|---|---|---|---|---|
COBITInformationCriterion | CobiT Qualitätsmerkmal für Information, | COBIT Information Criterion | COBIT Business Requirements for Information | ... |
And this got translated to:
he:COBITInformationCriterion
rdfs:label "COBIT Information Criterion"@en ;
rdfs:label <CobiT%20Qualit%C3%A4tsmerkmal%20f%C3%BCr%20Information%20> ;
skos:altLabel "COBIT Business Requirements for Information"@en
...
So the Englisch label is not a problem (the skos:altLabel may already be fixed) but the German one is. Could it be because of the comma character? CSV is comma separated values so the field escaping may not be correct or incompatible between LibreOffice standard settings and tarql.
Let's test this hypothesis with http://www.snik.eu/ontology/he/EPK (by the way there seem to be two resources for this concept, see issue #255):
SubjektUri | SubjektDe | SubjektEn | ... |
---|---|---|---|
EPK | EPK | Ereignisgesteuerte Prozesskette | ... |
EPK | EPK | ereignisgesteuerte Prozesskette, Event-driven Process Chain | ... |
Gets transformed into:
he:EPK a owl:Class ;
rdfs:label "EPK"@de , "Ereignisgesteuerte Prozesskette"@en ;
rdfs:label <ereignisgesteuerte%20Prozesskette%20Event-driven%20Process%20Chain> ;
...
So this again looks like the comma is the problem.
In the CSV file, the label is enquoted using double quotes:
EPK,EPK,"ereignisgesteuerte Prozesskette, Event-driven Process Chain",,,EntityType,dc:source,he:Glossar,,,,,,,ereignisgesteuerte Prozesskette; Modellierungssprache für Geschäftsprozesse.,,,,0,,,,,,
It also seems to be at the correct place, as the rest of the column isn't shifted. So in this case there are two problems: (1) TARQL incorrectly models labels with commas as resources and (2) the extractor put two labels with different languages in the SubjektEN field, which should contain only a single Englisch label.
Execute the CSV2RDF process on he again:
SUBS="he"
in map
map
he/out/all.ttl
Das Label ist in all.ttl auch korrekterweise in ein Label transformiert. Es stehen immer noch zwei drin aber das ist ein Eingabefehler und sollte in der Konsolidierung behoben werden.
he:EPK
he:chapter he:INBAN ;
he:page "435"^^xsd:positiveInteger ;
meta:consolidated false ;
meta:subTopClass meta:EntityType ;
a owl:Class ;
rdfs:label "EPK"@de, "Ereignisgesteuerte Prozesskette"@en, "ereignisgesteuerte Prozesskette, Event-driven Process Chain"@en ;
skos:definition "Modellierungsprache für Geschäftsprozesse"@de, "ereignisgesteuerte Prozesskette; Modellierungssprache für Geschäftsprozesse."@de .
Allerdings sind die eben auch wieder falsch eingegeben, daher korrigiere ich die 102 jetzt schnell manuell, da es automatisch z.B. nicht möglich ist zu entscheiden, dass "Auslagerung von Aufgaben Kompetenzen und Ressourcen" eigentlich "Auslagerung von Aufgaben, Kompetenzen und Ressourcen" sein soll aber "Service quaility Servicequalität" in zwei Labels "Service Quality"@en und "Servicequalität"@de aufgesplittet und korrigiert werden muss.
construct
{
?x rdfs:label ?l.
}
from sniko:he
where
{
?x rdfs:label ?z.
filter((isIRI(?z))).
bind(strlang(?z,"de") as ?l).
}
Erzeugt fertige labels aber ist zu umständlich zu bearbeiten. Einfacher:
select ?x rdfs:label ?z
{
?x rdfs:label ?z.
filter((isIRI(?z))).
bind(strlang(?z,"de") as ?l).
}
Ergebnis als Tabelle öffnen, je drei Spalten für Deutsch und Englisch.
Korrigiert und ummodelliert: helabelfix.xlsx (GitHub cannot up upload CSV)
Added to the csv2rdf repository. Result: helabelfix.zip
Uploaded but the URIs are incorrect, correct in SPARUL and correct tarql file as well.
with sniko:he
delete {?x ?y ?z.}
insert {?x ?y ?fixed.}
where
{
?x ?y ?z.
filter(strstarts(str(?x),"http://www.snik.eu/ontology/he/http://www.snik.eu/ontology/he/"))
bind(uri(replace(str(?x),"http://www.snik.eu/ontology/he/http://www.snik.eu/ontology/he/","http://www.snik.eu/ontology/he/")) as ?fixed)
}
Some URI labels still remain, removed via:
sparql
with sniko:he
delete
{
?x rdfs:label ?l.
}
where
{
?x rdfs:label ?l.
filter(isIRI(?l)).
}
Fix meta as well:
sparql
with sniko:meta
delete
{
?x ?y <owl:DeprecatedProperty>.
}
insert
{
?x ?y owl:DeprecatedProperty.
}
where
{
?x ?y <owl:DeprecatedProperty>.
}
There still seem to be some triples in a "non-www" graph http://snik.eu/ontology/meta, delete in issue #256.
For example
<he:Abteilung>
.