Closed AlanSimmons closed 1 year ago
I developed a solution to handle Turtle files that winds up addressing larger issues of code maintenance for the codeReplacements
function.
There are three basic types of conversions required in codeReplacements
:
The codeReplacements
function continues to contain the logic for handling the UMLS nodes and the truly special cases (1 and 3 above). However, for the case of simple prefix-SAB mappings, the function now reads a CSV file in the application directory named prefixes.csv.
The use of the prefixes.csv resource file should make it easier to respond to new sets of assertions without significant modification of the logic in the codeReplacements
function.
Statement of Problem
Using PheKnowLator to process OWL files in Turtle serialization can introduce issues with namespaces (SABs) in the resulting OWLNETS files. This is an issue for any OWL file that is available in Turtle, including:
Details
PheKnowLator assumes RDF/XML as input. To work with an OWL file that is in another serialization, it is necessary first to convert to RDF/XML. The generation framework does this using the rdflib package.
For files in Turtle format, the generation framework parses the file in TTL and then serializes to XML.
Turtle files contain a prefix section that associates portions of URIs with namespaces. Following are examples of prefixes from the NPO Turtle file:
When the Turtle file is serialized to XML, namespace prefixes are translated, and so are lost to PheKnowLator.
Example
Turtle
Translated RDF/XML
OWLNETS_edgelist.txt:
(PheKnowLator only translates information OWL information relevant to knowledge graphs.)
If the Turtle prefix is an IRI that is similar to a OBO IRI (such as the Interlex IRI above), then it may be possible to define a namespace. However, prefixes such as
@prefix ilxtr: <http://uri.interlex.org/tgbugs/uris/readable/>
do not translate to a OBO equivalent.Solution Options
We need to obtain the original namespace prefixes from the Turtle file--in effect, translate from the full IRIs in the OWLNETS files back to the Turtle prefixes.
The most straightforward way would be simply to add more "special cases" to the existing
codeReplacements
function. This would be justified in that:We could automate this to a degree by having the framework read the original Turtle files and extract namespaces from the prefixes. However, we would need to provide a list of Turtle files to read, and some of the prefixes are actually for relationships. This does not seem to be much better a solution than adding special cases manually.