owlcs / owlapi

OWL API main repository
821 stars 315 forks source link

OBO serializer incorrectly quoting IRI property values #1083

Closed gouttegd closed 1 year ago

gouttegd commented 1 year ago

The OBO serializer in owlapi 4.5.22 generates slightly incorrect property_value tags, in that the value of the property is always quoted, as in the following example:

Per the specification of the OBO Flat File Format (in its 1.4 version), property values should be either unquoted IDs (fully expanded IRIs or CURIEs) or quoted literal values followed by a type specification:

property_value-Tag Relation-ID ( QuotedString XSD-Type | ID )

So, when the value of a property is an ID, it should not be quoted.

For reference, related ticket on the ODK repository: INCATools/ontology-development-kit#770

gouttegd commented 1 year ago

AFAICT, the cause for the systematic quoting behaviour is found in two places:

The result is that by the point we could decide whether the value needs quoting or not (in OBOFormatWriter.writePropertyValue), the information that the value is an ID is lost – all we have is a java.lang.String.

I can think of three ways of fixing that:

FWIW, I feel the first option would be an hideous hack and the third one would be somehow a violation of the separation of duties between OWLAPIOwl2Obo and OBOFormatWriter.

gouttegd commented 1 year ago

@hkir-dev If you want to have a look.

gouttegd commented 1 year ago

FWIW, rough proof-of-concept of a fix using option 2:

--- a/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
@@ -1085,7 +1085,7 @@ public class OWLAPIOwl2Obo {
             clause.addValue(propId);
             if (annVal instanceof OWLLiteral) {
                 OWLLiteral owlLiteral = (OWLLiteral) annVal;
-                clause.addValue(owlLiteral.getLiteral());
+                clause.addValue(new PropertyValue(owlLiteral.getLiteral()));
                 OWLDatatype datatype = owlLiteral.getDatatype();
                 IRI dataTypeIri = datatype.getIRI();
                 if (!OWL2Datatype.isBuiltIn(dataTypeIri)) {
@@ -1100,7 +1100,7 @@ public class OWLAPIOwl2Obo {
                     clause.addValue(dataTypeIri.toString());
                 }
             } else if (annVal instanceof IRI) {
-                clause.addValue(getIdentifier((IRI) annVal));
+                clause.addValue(new PropertyValue(getIdentifier((IRI) annVal), true));
             }
             frame.addClause(clause);
         }
diff --git a/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
new file mode 100644
index 000000000..131ac36c9
--- /dev/null
+++ b/oboformat/src/main/java/org/obolibrary/obo2owl/PropertyValue.java
@@ -0,0 +1,25 @@
+package org.obolibrary.obo2owl;
+
+public class PropertyValue {
+    
+    private String value;
+    private boolean valueIsId;
+    
+    public PropertyValue(String val) {
+        value = val;
+        valueIsId = false;
+    }
+    
+    public PropertyValue(String val, boolean isId) {
+        value = val;
+        valueIsId = isId;
+    }
+    
+    public String toString() {
+        return value;
+    }
+    
+    public boolean isId() {
+        return valueIsId;
+    }
+}
diff --git a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
index 47120badb..bfa043d0c 100644
--- a/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
+++ b/oboformat/src/main/java/org/obolibrary/oboformat/writer/OBOFormatWriter.java
@@ -25,6 +25,7 @@ import java.util.Set;
 import javax.annotation.Nonnull;
 import javax.annotation.Nullable;

+import org.obolibrary.obo2owl.PropertyValue;
 import org.obolibrary.obo2owl.OWLAPIObo2Owl;
 import org.obolibrary.oboformat.model.Clause;
 import org.obolibrary.oboformat.model.Frame;
@@ -558,10 +559,15 @@ public class OBOFormatWriter {
         if (it.hasNext()) {
             // value
             sb.append(' ');
-            String val = it.next().toString(); // TODO replace toString() method
-            sb.append('"');
-            sb.append(escapeOboString(val, EscapeMode.quotes));
-            sb.append('"');
+            PropertyValue val = (PropertyValue) it.next();
+            if ( val.isId() ) {
+                sb.append(escapeOboString(val.toString(), EscapeMode.simple));
+            }
+            else {
+                sb.append('"');
+                sb.append(escapeOboString(val.toString(), EscapeMode.quotes));
+                sb.append('"');
+            }
         }
         while (it.hasNext()) {
             // optional type; there should be only one value left in the iterator
ignazio1977 commented 1 year ago

Thanks I'll take a look later today

hkir-dev commented 1 year ago

PR created: https://github.com/owlcs/owlapi/pull/1085

In relation with the OBO Specification of the property_value:

property_value This tag binds a property to a value in this instance. The value of this tag is a relationship type id, a space, and a value specifier. The value specifier may have one of two forms; in the first form, it is just the id of some other instance, relationship type or term. In the second form, the value is given by a quoted string, a space, and datatype identifier.

IRI values are distinguished from the quoted string values using the heuristic approach applied in the OWLAPIObo2Owl

If Clause has 2 values then it is `relationship type id` + `IRI`. 
If Clause has 3 values then it is `relationship type id` + `value in quoted strings` + `datatype identifier`
ignazio1977 commented 1 year ago

fixed in 4 and 5, to be fixed in 6