Statement of Problem

Using PheKnowLator to process OWL files in Turtle serialization can introduce issues with namespaces (SABs) in the resulting OWLNETS files. This is an issue for any OWL file that is available in Turtle, including:

GlyCoCoO (part of GlyGen)
NPO and NPOSKCAN (part of SPARC)

Details

PheKnowLator assumes RDF/XML as input. To work with an OWL file that is in another serialization, it is necessary first to convert to RDF/XML. The generation framework does this using the rdflib package.

For files in Turtle format, the generation framework parses the file in TTL and then serializes to XML.

graph = Graph().parse(owl_file,format='ttl')
    convertedpath = os.path.join(owl_dir,'converted.owl')
    v = graph.serialize(format='xml', destination=convertedpath)
    graph2 = Graph().parse(convertedpath, format='xml')
    graph = graph2

Turtle files contain a prefix section that associates portions of URIs with namespaces. Following are examples of prefixes from the NPO Turtle file:

@prefix AllenTransgenicLine: <http://api.brain-map.org/api/v2/data/TransgenicLine/> .
@prefix BFO: <http://purl.obolibrary.org/obo/BFO_> .
@prefix ILX: <http://uri.interlex.org/base/ilx_> .
@prefix ilxr: <http://uri.interlex.org/base/readable/> .
@prefix ilxtr: <http://uri.interlex.org/tgbugs/uris/readable/> .

When the Turtle file is serialized to XML, namespace prefixes are translated, and so are lost to PheKnowLator.

Example

Turtle

@prefix ILX: <http://uri.interlex.org/base/ilx_>
.
.
.
ILX:0101528 a owl:Class ;
    rdfs:label "CA2 alveus" ;
    rdfs:subClassOf UBERON:0002305,
        [ a owl:Restriction ;
            owl:onProperty ilx.partOf: ;
            owl:someValuesFrom UBERON:0007639 ] .

Translated RDF/XML

<rdf:Description rdf:about="http://uri.interlex.org/base/ilx_0101528">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
    <rdfs:label>CA2 alveus</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002305"/>
    <rdfs:subClassOf rdf:nodeID="na9ff5cff0e4544b79582c69e889226d4b13"/>
  </rdf:Description>

OWLNETS_edgelist.txt:

http://uri.interlex.org/base/ilx_0101528    http://www.w3.org/2000/01/rdf-schema#subClassOf http://purl.obolibrary.org/obo/UBERON_0002305

(PheKnowLator only translates information OWL information relevant to knowledge graphs.)

If the Turtle prefix is an IRI that is similar to a OBO IRI (such as the Interlex IRI above), then it may be possible to define a namespace. However, prefixes such as @prefix ilxtr: <http://uri.interlex.org/tgbugs/uris/readable/> do not translate to a OBO equivalent.

Solution Options

We need to obtain the original namespace prefixes from the Turtle file--in effect, translate from the full IRIs in the OWLNETS files back to the Turtle prefixes.

The most straightforward way would be simply to add more "special cases" to the existing codeReplacements function. This would be justified in that:

We could select only those prefixes that relate to the nodes of interest.
The set of possible cases is likely to be small. We're only dealing with a handful of Turtle files (<5 ) for the initial round.
The Turtle files are already published.
Authors of Turtle files can argue that the Turtle files are in a legitimate format, and it's up to us to translate them correctly to OWLNETS. The issues actually arise from the need to serialize Turtle to RDF/XML before running PheKnowLator.

We could automate this to a degree by having the framework read the original Turtle files and extract namespaces from the prefixes. However, we would need to provide a list of Turtle files to read, and some of the prefixes are actually for relationships. This does not seem to be much better a solution than adding special cases manually.

x-atlas-consortia / ubkg-etl

Turtle to OWLNETS issue: @prefix and namespaces #36