lapps / vocabulary-pages

DSL files and templates used to generate the LAPPS WS-EV pages.
Apache License 2.0
0 stars 0 forks source link

TTL version of the vocabulary needs improvement #84

Open reckart opened 5 years ago

reckart commented 5 years ago

It seems to me as if the TTL version (or maybe all LD versions) of the LAPPS vocabulary could use some refactoring.

My understanding is that these should represent a schema (based on OWL and/or RDFS). As such, the LAPPS types would be classes (rdfs:class or owl:class) and their attributes should be properties (rdfs:Property, owl:DatatypeProperty or owl:ObjectProperty).

Let's take http://vocab.lappsgrid.org/Token as an example. The current TTL file says:

<http://vocab.lappsgrid.org/Token>
        a                owl:Class , rdfs:Class , rdfs:Resource ;
        rdfs:comment     "A string of one or more characters that serves as an indivisible unit for the purposes of morpho-syntactic labeling (part of speech tagging)." ;
        rdfs:subClassOf  <http://vocab.lappsgrid.org/Region> , <http://vocab.lappsgrid.org/Token> , <http://vocab.lappsgrid.org/Annotation> , <http://vocab.lappsgrid.org/Thing> ;
       <http://vocab.lappsgrid.org/Token#pos>
                "String or URI" .

<http://vocab.lappsgrid.org/Token#pos>
        a             owl:DatatypeProperty ;
        rdfs:comment  "Part-of-speech tag associated with the token." .

The inheritance information is highly redundant. The triple <http://vocab.lappsgrid.org/Token> <http://vocab.lappsgrid.org/Token#pos> "String or URI" does not express in RDFS or OWL that Token has an attribute called pos which can take a String or URI.

I believe a better representation would be e.g.

<http://vocab.lappsgrid.org/Token>
        a                owl:Class ;
        rdfs:comment     "A string of one or more characters that serves as an indivisible unit for the purposes of morpho-syntactic labeling (part of speech tagging)." ;
        rdfs:subClassOf  <http://vocab.lappsgrid.org/Region> ;

<http://vocab.lappsgrid.org/Token#pos>
        a             owl:DatatypeProperty ;
        rdfs:comment  "Part-of-speech tag associated with the token." ;
        rdfs:domain <http://vocab.lappsgrid.org/Token> ;
        rdfs:range xsd:string .

I removed the (inferred) redundant information from the a and rdfs:cubClassOf statements and rendered the value type information as rdfs:range.

However, there is still a little problem here: it does not express that the range can be a "String or URI" - specifying multiple types as range indicates an intersection of the types (which would be empty in this case), not a disjunction. That is why I only put the "more generic" type xsd:stringhere.

ksuderman commented 5 years ago

The RDF, OWL, JSONLD, and TTL files are generated by Apache Jena from the same data model and I notice that the OWL, JSONLD, and TTL files all contain redundant inheritance declarations while the RDF file does not. The only difference between how the files were generated is the value of the RDFFormat parameter to the RDFDataMgr.write() method. OntClass.setSuperClass(Resource) is only being called once. We are using an old version of Jena so hopefully simply updating the dependency will correct this.

The code that generates the property definitions is just plain buggy.

Both issues will be fixed in https://github.com/lappsgrid-incubator/vocabulary-dsl/issues/10

ksuderman commented 5 years ago

@reckart I have deployed a test version to http://vocab.lappsgrid.org/1.3.0-SNAPSHOT for comment and review. In particular the RDF files are at http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/lapps-vocabulary.ttl (et al).

All of the generated RDF files had the same redundant information as the default Jena model uses a Reasoner that generates all the triples it can infer. The redundant triples are removed by specifying a model that does not do inferencing.

The domain and range of properties should now be specified correctly.

Note There are two definitions for Morphology included (http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/Morphology and http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/Token#morph). These are included only to test the schema generation and file deployment and do not represent how the WSEV may eventually represent morphological annotations.

ksuderman commented 5 years ago

NOTE Updated URLs now contain -SNAPSHOT