snikproject / ontology

Public SNIK Ontology. An ontology of information management in hospitals.
https://snikproject.github.io/ontology/
Other
10 stars 1 forks source link

Correct URLs with non-expanded prefix #176

Closed KonradHoeffner closed 5 years ago

KonradHoeffner commented 5 years ago

For example <he:Abteilung>.

KonradHoeffner commented 5 years ago

This happens for labels as well, see for example #240.

KonradHoeffner commented 5 years ago

It does not happen with subjects:

select *
{
 ?x ?y ?z.
filter((!strStarts(str(?x),"http")) AND (!strStarts(str(?x),"nodeID"))).
}

However it happens very often with objects in the he graph and two times in the meta graph (private graphs not shown, ITIL is not affected):

select ?g count(*)
{
 graph ?g {?x ?y ?z.}
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
}
g callret-1
http://www.snik.eu/ontology/meta 2
http://www.snik.eu/ontology/he 1833

Affects the following properties in he:

select ?y count(?z) as ?incorrect count(?a) as ?total from sniko:he
{
{
?x ?y ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
}
 UNION
{
?x ?y ?a.
}
} group by ?y having(count(?z)>0)
y incorrect total
http://www.w3.org/2002/07/owl#annotatedTarget 190 192
http://www.w3.org/2004/02/skos/core#closeMatch 21 25
http://www.w3.org/2004/02/skos/core#altLabel 4 183
http://www.w3.org/2000/01/rdf-schema#label 102 2903
http://www.w3.org/2000/01/rdf-schema#subClassOf 645 1488
http://www.w3.org/2004/02/skos/core#related 2 2
http://www.w3.org/2000/01/rdf-schema#comment 2 4
http://www.w3.org/2004/02/skos/core#definition 865 1322
http://www.w3.org/2004/02/skos/core#relatedMatch 2 2

These are too many errors to have been caused by manual OntoWiki edits, there must have gone something wrong during the extraction. Strangely though, only part of it is affected, so maybe it has to do with the consolidated vs. the non-consolidated part.

KonradHoeffner commented 5 years ago

At least the non-labels are easy to fix:

  1. Make a backup via dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0801he', 1000000000);
  2. Get all the triples we want to fix:
    select ?x ?y ?z from sniko:he
    {
    ?x ?y ?z.
    filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition).
    }
  3. Fix the triples and test it:
    select ?x ?y replace(str(?z),"he:","http://www.snik.eu/ontology/he/") from sniko:he
    {
    ?x ?y ?z.
    filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition).
    }
  4. Apply the fix and cross fingers:
    
    with <http://www.snik.eu/ontology/he>

delete {?x ?y ?z} insert {?x ?y ?fixed.} where { ?x ?y ?z. filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID")) AND ?y!=rdfs:label AND ?y!=rdfs:comment AND ?y!=skos:altLabel AND ?y!=skos:definition). bind(IRI(replace(str(?z),"he:","http://www.snik.eu/ontology/he/")) as ?fixed) }


This deleted and inserted 860 triples and http://www.snik.eu/ontology/he/ChiefInformationOfficer looks good.

The non-literal-range-properties are also not shown anymore by the error detecting query:

y | incorrect | total
-- | -- | --
http://www.w3.org/2004/02/skos/core#altLabel | 4 | 183
http://www.w3.org/2000/01/rdf-schema#label | 102 | 2903
http://www.w3.org/2000/01/rdf-schema#subClassOf | 1 | 1488
http://www.w3.org/2000/01/rdf-schema#comment | 2 | 4
http://www.w3.org/2004/02/skos/core#definition | 865 | 1322
KonradHoeffner commented 5 years ago

Now for the 865 broken definitions:

  1. New backup via dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0802he', 1000000000);
  2. Inspect them
    select * from sniko:he
    {
    ?x skos:definition ?z.
    filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
    }

    Should all be German, verified by looking at all definitions that contain "the":

select * from sniko:he
{
?x skos:definition ?z.
filter((isIRI(?z)) AND (!strStarts(str(?z),"http")) AND (!strStarts(str(?z),"nodeID"))).
filter(contains(str(?z),"the"))
}

At this point one can also at the extraction table of Heinrich look: https://github.com/imise/snik-csv2rdf

In the main template https://github.com/IMISE/snik-csv2rdf/blob/a167c5d8294192bf259705f187d3564111f3b4ca/main.tarql.template there is: BIND (STRLANG(?Definition,"de") AS ?d), so they should really all be labels and also always be German.

  1. Fix the triples and test:

    select ?x ?fixed from sniko:he
    {
    ?x skos:definition ?z.
    FILTER(isIRI(?z)).
    BIND(STRLANG(?z,"de") AS ?fixed)
    }

    Looks good.

  2. Apply the fix and check:

with <http://www.snik.eu/ontology/he>

delete {?x skos:definition ?z.}
insert {?x skos:definition ?fixed.}
where
{
 ?x skos:definition ?z.
 FILTER(isIRI(?z)).
 BIND(STRLANG(?z,"de") AS ?fixed)
}
KonradHoeffner commented 5 years ago

We now have:

y incorrect total
http://www.w3.org/2004/02/skos/core#altLabel 4 183
http://www.w3.org/2000/01/rdf-schema#label 102 2903
http://www.w3.org/2000/01/rdf-schema#subClassOf 1 1488
http://www.w3.org/2000/01/rdf-schema#comment 2 4

Fix subClassOf via:

delete data from sniko:he
{<http://www.snik.eu/ontology/he/BenutzerforschungDurchfuehren> rdfs:subClassOf <meta:Function>}

insert data into sniko:he
{<http://www.snik.eu/ontology/he/BenutzerforschungDurchfuehren> rdfs:subClassOf meta:Function}

Fix comments via:

delete data from sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment <The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden.'>
<http://www.snik.eu/ontology/he> rdfs:comment   <Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl Dirk Stelzer Herrmann Sikora>
}

insert data into sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment "The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden'."@en.
<http://www.snik.eu/ontology/he> rdfs:comment   "Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl, Dirk Stelzer, Herrmann Sikora."@de.
}

Does not work because of the accept but even with unicode escaping there is a problem, so you OntoWiki https://www.snik.eu/ontowiki/view/?r=http://www.snik.eu/ontology/he&m=http://www.snik.eu/ontology/he for that.

Works like this:

with sniko:he
delete 
{
<http://www.snik.eu/ontology/he> rdfs:comment ?z.
}
where
{
<http://www.snik.eu/ontology/he> rdfs:comment ?z.
}

insert data into sniko:he
{
<http://www.snik.eu/ontology/he> rdfs:comment "The Ontology extracted from 'Informationsmanagement: Grundlagen Aufgaben Methoden'."@en.
<http://www.snik.eu/ontology/he> rdfs:comment   "Informationsmanagement: Grundlagen Aufgaben Methoden. Lutz J. Heinrich Ren\u00e9 Riedl, Dirk Stelzer, Herrmann Sikora."@de.
}

Fix skos:altLabel with:

dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0805he', 1000000000);

delete data from sniko:he
{
?x skos:altLabel ?y.
filter(isIRI(?y))
}

insert data into sniko:he
{
<http://www.snik.eu/ontology/he/Datenschutzgesetz> skos:altLabel    "DSG (\u00d6CH)"@de.
<http://www.snik.eu/ontology/he/DigitalBusiness> skos:altLabel  "Electronic Business"@en, "eBusiness"@en.
<http://www.snik.eu/ontology/he/Handelsgesetzbuch> skos:altLabel    "HGB (D\u00d6)"@de.
<http://www.snik.eu/ontology/he/Produkthaftungsgesetz> skos:altLabel    "PHG (D\u00d6)"@de.
}
KonradHoeffner commented 5 years ago

New backup: dump_one_graph ('http://www.snik.eu/ontology/he', './dumps/0806he', 1000000000);

Now we just have the labels:

http://www.w3.org/2000/01/rdf-schema#label | 102 | 2903

select * from sniko:he
{
 ?x rdfs:label ?l.
 filter(isIRI(?l)).
}

The problem is that multiple labels in different languages are joined in one URI, for example:

Subject Label URI
http://www.snik.eu/ontology/he/COBITInformationCriterion CobiT Qualitätsmerkmal für Information
http://www.snik.eu/ontology/he/CollaborationTools Collaborative Software Groupware
http://www.snik.eu/ontology/he/DBMS Database Management System Datenbankverwaltungssystem
http://www.snik.eu/ontology/he/Dienstleistungsqualitaet Service quaility Servicequalität
http://www.snik.eu/ontology/he/Dienstleistungsqualitaet Servicequalität Service Quality
http://www.snik.eu/ontology/he/Durchlaufzeit Cycle Time Lead Time
http://www.snik.eu/ontology/he/EPK ereignisgesteuerte Prozesskette Event-driven Process Chain
http://www.snik.eu/ontology/he/EVA Earned Value Analysis Earned-Value-Analyse
http://www.snik.eu/ontology/he/Einsatzplan Initiative Guide Plan of Action
http://www.snik.eu/ontology/he/Entscheidungsregel Aggregationsfunktion Decision Rule
http://www.snik.eu/ontology/he/Entwicklungsrueckstau Anwendungsrückstau Application Backlog

This seems hard to solve automatically, there are two options: (1) restore it from the extraction tables and (2) do it manually for the 102 labels.

First, look at the extraction table https://github.com/IMISE/snik-csv2rdf/releases/download/0.2.0/he-main.csv.

As an example, lets use he:COBITInformationCriterion:

SubjektUri SubjektDe SubjektEn SubjektAltDe SubjektAltEn ...
COBITInformationCriterion CobiT Qualitätsmerkmal für Information, COBIT Information Criterion   COBIT Business Requirements for Information ...

And this got translated to:

he:COBITInformationCriterion
        rdfs:label         "COBIT Information Criterion"@en ;
        rdfs:label         <CobiT%20Qualit%C3%A4tsmerkmal%20f%C3%BCr%20Information%20> ;
        skos:altLabel      "COBIT Business Requirements for Information"@en 
        ...

So the Englisch label is not a problem (the skos:altLabel may already be fixed) but the German one is. Could it be because of the comma character? CSV is comma separated values so the field escaping may not be correct or incompatible between LibreOffice standard settings and tarql.

Let's test this hypothesis with http://www.snik.eu/ontology/he/EPK (by the way there seem to be two resources for this concept, see issue #255):

SubjektUri SubjektDe SubjektEn ...
EPK EPK Ereignisgesteuerte Prozesskette ...
EPK EPK ereignisgesteuerte Prozesskette, Event-driven Process Chain ...

Gets transformed into:

he:EPK  a                  owl:Class ;
        rdfs:label         "EPK"@de , "Ereignisgesteuerte Prozesskette"@en ;
        rdfs:label         <ereignisgesteuerte%20Prozesskette%20Event-driven%20Process%20Chain> ;
        ...

So this again looks like the comma is the problem.

KonradHoeffner commented 5 years ago

In the CSV file, the label is enquoted using double quotes: EPK,EPK,"ereignisgesteuerte Prozesskette, Event-driven Process Chain",,,EntityType,dc:source,he:Glossar,,,,,,,ereignisgesteuerte Prozesskette; Modellierungssprache für Geschäftsprozesse.,,,,0,,,,,,

It also seems to be at the correct place, as the rest of the column isn't shifted. So in this case there are two problems: (1) TARQL incorrectly models labels with commas as resources and (2) the extractor put two labels with different languages in the SubjektEN field, which should contain only a single Englisch label.

KonradHoeffner commented 5 years ago

Execute the CSV2RDF process on he again:

  1. install Tarql
  2. checkout https://github.com/IMISE/snik-csv2rdf
  3. download https://github.com/IMISE/snik-csv2rdf/releases/download/0.2.0/he-main.csv and put and put in in the he-folder
  4. set SUBS="he" in map
  5. execute map
  6. investigate he/out/all.ttl

Das Label ist in all.ttl auch korrekterweise in ein Label transformiert. Es stehen immer noch zwei drin aber das ist ein Eingabefehler und sollte in der Konsolidierung behoben werden.

he:EPK
    he:chapter he:INBAN ;
    he:page "435"^^xsd:positiveInteger ;                                                                                                                                                                           
    meta:consolidated false ;
    meta:subTopClass meta:EntityType ;
    a owl:Class ;
    rdfs:label "EPK"@de, "Ereignisgesteuerte Prozesskette"@en, "ereignisgesteuerte Prozesskette, Event-driven Process Chain"@en ;
    skos:definition "Modellierungsprache für Geschäftsprozesse"@de, "ereignisgesteuerte Prozesskette; Modellierungssprache für Geschäftsprozesse."@de .

Allerdings sind die eben auch wieder falsch eingegeben, daher korrigiere ich die 102 jetzt schnell manuell, da es automatisch z.B. nicht möglich ist zu entscheiden, dass "Auslagerung von Aufgaben Kompetenzen und Ressourcen" eigentlich "Auslagerung von Aufgaben, Kompetenzen und Ressourcen" sein soll aber "Service quaility Servicequalität" in zwei Labels "Service Quality"@en und "Servicequalität"@de aufgesplittet und korrigiert werden muss.

KonradHoeffner commented 5 years ago
construct
{
 ?x rdfs:label ?l.
}
from sniko:he
where
{
?x rdfs:label ?z.
filter((isIRI(?z))).
bind(strlang(?z,"de") as ?l).
}

Erzeugt fertige labels aber ist zu umständlich zu bearbeiten. Einfacher:

select ?x rdfs:label ?z
{
?x rdfs:label ?z.
filter((isIRI(?z))).
bind(strlang(?z,"de") as ?l).
}

Ergebnis als Tabelle öffnen, je drei Spalten für Deutsch und Englisch.

KonradHoeffner commented 5 years ago

helabelfix.xlsx

KonradHoeffner commented 5 years ago

Korrigiert und ummodelliert: helabelfix.xlsx (GitHub cannot up upload CSV)

KonradHoeffner commented 5 years ago

Added to the csv2rdf repository. Result: helabelfix.zip

KonradHoeffner commented 5 years ago

Uploaded but the URIs are incorrect, correct in SPARUL and correct tarql file as well.

KonradHoeffner commented 5 years ago
with sniko:he
delete {?x ?y ?z.}
insert {?x ?y ?fixed.}
where
{
 ?x ?y ?z.
 filter(strstarts(str(?x),"http://www.snik.eu/ontology/he/http://www.snik.eu/ontology/he/"))
 bind(uri(replace(str(?x),"http://www.snik.eu/ontology/he/http://www.snik.eu/ontology/he/","http://www.snik.eu/ontology/he/")) as ?fixed)
}
KonradHoeffner commented 5 years ago

Some URI labels still remain, removed via:

sparql

with sniko:he
delete 
{
 ?x rdfs:label ?l.
}
where
{
?x rdfs:label ?l.
 filter(isIRI(?l)).
}
KonradHoeffner commented 5 years ago

Fix meta as well:

sparql

with sniko:meta
delete
{
 ?x ?y <owl:DeprecatedProperty>.
}
insert
{
 ?x ?y owl:DeprecatedProperty.
}
where
{
 ?x ?y <owl:DeprecatedProperty>.
}

There still seem to be some triples in a "non-www" graph http://snik.eu/ontology/meta, delete in issue #256.