owlcs / owlapi

OWL API main repository
813 stars 314 forks source link

Support for prefix declarations in OBO format #1102

Open balhoff opened 1 year ago

balhoff commented 1 year ago

This adds support for reading and writing prefix declarations in OBO format (idspace tags), and using them for expanding and compacting identifiers. It builds on initial work done by @gouttegd in https://github.com/owlcs/owlapi/pull/1072, moved to the version4 branch. These changes also treat values of replaced_by and consider tags as IRIs rather than strings.

OBODocumentFormat now extends PrefixDocumentFormatImpl.

This should stay as a draft PR until some tests are added.

balhoff commented 11 months ago

@ignazio1977 @cmungall @gouttegd I think this is ready.

balhoff commented 11 months ago

I built a ROBOT jar which can be used to test the prefixes support: https://github.com/balhoff/owlapi/releases/download/prefixes-test/robot.jar

cmungall commented 11 months ago

java -jar ~/tmp/robot.jar convert -I $OBO/hsapdv.owl -o /tmp/foo.obo

results in a very large owl-axioms header for untranslateable axioms

owl-axioms: Prefix(owl:=<http://www.w3.org/2002/07/owl#>)\nPrefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)\nPrefix(xml:=<http://www.w3.org/XML/1998/namespace>)\nPrefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)\nPrefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)\n\n\nOntology(\nDeclaration(AnnotationProperty(<http://www.geneontology.org/formats/oboInOwl#id>))\n\n\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/BFO_0000050> \"part_of\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/BFO_0000062> \"preceded_by\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000000> \"HsapDv:0000000\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000001> \"HsapDv:0000001\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000002> \"HsapDv:0000002\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000003> \"HsapDv:0000003\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000004> \"HsapDv:0000004\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000005> \"HsapDv:0000005\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000006> \"HsapDv:0000006\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000007> \"HsapDv:0000007\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000008> \"HsapDv:0000008\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000009> \"HsapDv:0000009\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000010> \"HsapDv:0000010\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000011> \"HsapDv:0000011\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000012> \"HsapDv:0000012\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000013> \"HsapDv:0000013\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000014> \"HsapDv:0000014\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000015> \"HsapDv:0000015\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000016> \"HsapDv:0000016\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000017> \"HsapDv:0000017\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000018> \"HsapDv:0000018\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000019> \"HsapDv:0000019\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000020> \"HsapDv:0000020\")\nAnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000021>

and also has ID CURIEs translated to obo:...

[Term]
id: obo:HsapDv_0000001
name: human life cycle
namespace: human_developmental_stage
def: "Temporal interval that defines human life from the prenatal stage until late adulthood." [Bgee:curator]
xref: UBERON:0000104
is_a: obo:HsapDv_0000000 ! human life cycle stage
property_value: hsapdv:start_dpf "0.0" xsd:float

I don't think there is anything unusual about this ontology - cl.owl exhibits the same behavior.

cmungall commented 11 months ago

If the ontology is converted from .obo, there is no problem

I though that somehow the oboInOwl:id annotations were the issue, removing them gets rid of the giant owl-prefix, but the ID issue remains

gouttegd commented 11 months ago

If the ontology is converted from .obo, there is no problem

In fact this seems to happen only if the source ontology is in RDF/XML. I don’t see anything wrong if the source ontology is in Manchester or functional syntax.

gouttegd commented 11 months ago

The ID issue seems more precisely caused by the xmlns:obo=http://purl.obolibrary.org/obo/ namespace declaration in the RDF/XML file, coupled to a behaviour of the OWL API prefix manager that I quite don’t understand:

DefaultPrefixManager pm = new DefaultPrefixManager();
pm.setPrefix("obo:", "http://purl.obolibrary.org/obo/");
pm.setPrefix("CL:", "http://purl.obolibrary.org/obo/CL_");

IRI iri = IRI.create("http://purl.obolibrary.org/obo/CL_0001");

System.err.printf("IRI to CURIE: %s -> %s\n", iri, pm.getPrefixIRI(i));

This gives:

IRI to curie: http://purl.obolibrary.org/obo/CL_0001 -> obo:CL_0001

Not sure why the prefix manager does not recognise the longer, more specific CL URL prefix.

gouttegd commented 11 months ago

If I understand correctly, this is because the OWL API prefix manager does not simply search for the longest URL prefix in the prefix map. Before doing that, it first searches for a URL prefix that corresponds to the namespace of the IRI. If it finds one, it always uses that prefix without searching any further.

The IRI namespace is determined completely independently of any prefix map, solely on the basis of some XML rules about names (it’s the longest prefix such that whatever remains after the prefix is a XML NCName). With http://purl.obolibrary.org/obo/CL_0001 for example, the namespace is http://purl.obolibrary.org/obo/.

Now because the RDF/XML file contains a xmlns:obo=http://purl.obolibrary.org/obo/ namespace declaration, when the prefix manager looks up first the namespace of http://purl.obolibrary.org/obo/CL_0001, it finds the http://purl.obolibrary.org/obo/ URL prefix, and it does not search for any other prefix in the prefix map that could be a better match.

balhoff commented 11 months ago

@gouttegd thanks for looking into that. I actually had originally implemented my own search for the longest URL prefix, and then swapped in the prefix manager because it seemed proper to reuse code. But I much prefer always using the longest match!

I wonder if I should put in special case handling of namespaces that are http://purl.obolibrary.org/obo/ or substrings. If such a prefix is defined, currently the code will not have a chance to fall back to the built-in OBO compaction (this is separate from if you had also defined a CL prefix; I don't want to force OBO files to define all OBO prefixes).

cthoyt commented 11 months ago

The ‘curies’ package for Java is available on maven and has an implementation of uri compression that correctly handles getting the longest match

gouttegd commented 11 months ago

To be clear, the OWL API prefix manager does correctly handle getting the longest match. It’s just that it does so only if it does not find a prefix that matches the namespace of the IRI (where the namespace is defined as explained above) – if a match for the namespace is found, this takes precedence over everything else.

Overall, it seems like the OWL API DefaultPrefixManager has been designed with the particular constraints of XML tags in mind. Nothing wrong with that, but I think that makes it unsuitable for our particular use case here. I’d suggest either reverting to the original custom longest prefix search (which was working just fine last time I tested this PR, if I recall correctly) or adopting curies.

balhoff commented 11 months ago

I think the main problem @cmungall is seeing is the result of automatic insertion of oboInOwl:id annotations when parsing. When a term like this is parsed:

[Term]
id: HsapDv:0000001
name: human life cycle

An annotation is injected into the OWL:

AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000001> "HsapDv:0000001")

If there is an idspace defined like obo http://purl.obolibrary.org/obo/, this won't affect the parsing of the term, but it will be matched when the term is written. So the stanza gets created as:

[Term]
id: obo:HsapDv_0000001
name: human life cycle

And then the OBO writer doesn't know what to do with the id annotation, since there is now no HsapDv:0000001 frame, so it puts it in the owl-axioms header. If you read this OBO file back into OWL, there will then be two id annotations:

AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000001> "HsapDv:0000001")
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#id> <http://purl.obolibrary.org/obo/HsapDv_0000001> "obo:HsapDv_0000001")

I'm not sure, but I don't think the id annotations serve any purpose (the business of mapping relations to unprefixed names is handled by an inserted oboInOwl:shorthand annotation). If I suppress their creation, it fixes the owl-axioms header problem. But maybe that's not an acceptable change to make in this PR (would be good to do in OBO 1.6). But it will probably be super surprising to end up with obo: prefixed terms everywhere when converting from OWL files that happen to have that prefix defined. Should I make OBO format handle this namespace in a special way? I think that would also avoid these particular oboInOwl:id problems. But I don't like making special cases. And I suspect I can cause similar id annotation problems by putting other overlapping id spaces in the header (update: I can).

balhoff commented 8 months ago

@gouttegd @cmungall do you understand the point of this check for underscore? Trying to wrap up this PR and this is a sticking point: https://github.com/owlcs/owlapi/blob/13a2d43c04035e97ce94e380e9b0399ca2c70f06/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIObo2Owl.java#L1722-L1723

I don't want to append the # to the uriPrefix.

cmungall commented 8 months ago

I think this is for subset "IDs".

gouttegd commented 8 months ago

I believe this is a (poor) way of checking whether the ID is a “non-canonical prefixed ID” (canonical prefixed ID don’t contain an underscore in the local ID part). Such IDs MUST be expanded with a '#' character between the URL prefix and the local ID, according to the OBO Flat File Format specification (§5.9.2 Translation of identifiers).

I say a “poor way” because a “non-canonical prefixed ID” is defined (§2.5 Identifiers) basically as any prefixed ID that contains characters that are not allowed in a “canonical prefix ID” – so, any prefixed ID where the prefix part contains something else than alphanumeric characters and underscores and/or where the local part contains something else than digits.

In other words, that check fails to identify many ”non-canonical prefixed IDs” (all those that do not contain underscores in their local part, but may contain any other character not allowed in a canonical ID), which are then translated as if they were canonical.

Whether it would be a good idea to fix the parser (implement a proper check for “non-canonical prefixed IDs”) to make it really compliant with that section of the OBO Flat File Format specification, I am not sure.

cmungall commented 8 months ago

agreed

On Sat, Oct 21, 2023 at 2:40 AM Damien Goutte-Gattat < @.***> wrote:

I believe this is a (poor) way of checking whether the ID is a “non-canonical prefixed ID” (canonical prefixed ID don’t contain an underscore in the local ID part). Such IDs MUST be expanded with a '#' character between the URL prefix and the local ID, according to the OBO Flat File Format specification (§5.9.2 Translation of identifiers http://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.9.2).

I say a “poor way” because a “non-canonical prefixed ID” is defined (§2.5 Identifiers http://owlcollab.github.io/oboformat/doc/obo-syntax.html#2.5) basically as any prefixed ID that contains characters that are not allowed in a “canonical prefix ID” – so, any prefixed ID where the prefix part contains something else than alphanumeric characters and underscores and/or where the local part contains something else than digits.

In other words, that check fails to identify many ”non-canonical prefixed IDs” (all those that do not contain underscores in their local part, but may contain any other character not allowed in a canonical ID), which are then translated as if they were canonical.

Whether it would be a good idea to fix the parser (implement a proper check for “non-canonical prefixed IDs”) to make it really compliant with that section of the OBO Flat File Format specification, I am not sure.

— Reply to this email directly, view it on GitHub https://github.com/owlcs/owlapi/pull/1102#issuecomment-1773734341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJZ7HCPDTKPZP4KIRTYAOKA3AVCNFSM6AAAAAAYABVVQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZTG4ZTIMZUGE . You are receiving this because you were mentioned.Message ID: @.***>

balhoff commented 5 months ago

@cmungall @gouttegd I put up another ROBOT test jar containing recent changes I made on this PR: https://github.com/balhoff/owlapi/releases/download/prefixes-test-2/robot.jar

Notably I have disallowed using idspace prefixes that overlap http://purl.obolibrary.org/obo/. There is too much legacy behavior, especially for unprefixed IDs like subsets. This will let us use the idspaces support for the real needed use cases but hopefully not disrupt existing behavior. If you encounter any issues that I can add test cases for, please let me know. It would be great to release this, but I want to make sure it's solid.

gouttegd commented 5 months ago

The OBO serialiser writes idspace tags for a few common prefixes:

idspace: owl http://www.w3.org/2002/07/owl# 
idspace: rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# 
idspace: rdfs http://www.w3.org/2000/01/rdf-schema# 
idspace: xml http://www.w3.org/XML/1998/namespace 
idspace: xsd http://www.w3.org/2001/XMLSchema#

Is that intended?

I am not saying that the serialiser should not do that (though I tend to think it should not), but upon merging this branch with the tip of the version4 branch this causes some of the tests in the contract module to fail:

[ERROR]   BasicsTestCase.testPropertyValueQuotes:1203 expected: <format-version: 1.2
ontology: test

[Term]
id: X:1
property_value: rdfs:seeAlso "xx" xsd:string

[Term]
id: X:2
property_value: rdfs:seeAlso "1" xsd:int

> but was: <format-version: 1.2
idspace: owl http://www.w3.org/2002/07/owl# 
idspace: rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# 
idspace: rdfs http://www.w3.org/2000/01/rdf-schema# 
idspace: xml http://www.w3.org/XML/1998/namespace 
idspace: xsd http://www.w3.org/2001/XMLSchema# 
ontology: test

[Term]
id: X:1
property_value: rdfs:seeAlso "xx" xsd:string

[Term]
id: X:2
property_value: rdfs:seeAlso "1" xsd:int

>
gouttegd commented 5 months ago

A (previsible) side-effect of this PR is that using an undeclared prefix in a consider_by tag will yield an incorrect IRI. For example in Uberon:

consider: NIF_Subcellular:sao-1337158144

will yield the IRI http://purl.obolibrary.org/obo/NIF_Subcellular_sao-1337158144, which is most likely incorrect – and in addition, which the OBO serialiser fails to condense back, so when trying to serialise back to OBO this will become:

consider: http://purl.obolibrary.org/obo/NIF_Subcellular_sao-1337158144

If the NIF_Subcellular prefix is declared in a idspace tag, everything works fine.

(So, nothing wrong with the PR here: I am just noting that there will be some places to check in our ontologies when this PR lands in a released version of ROBOT.)

gouttegd commented 5 months ago

Another funny side-effect:

consider: Wikipedia:Limb_(anatomy)

becomes

consider: Wikipedia:Limb_%28anatomy%29

and, after another round trip, becomes

consider: Limb_%2528anatomy%2529

(For what it’s worth, I think that using a Wikipedia page as the value for a consider tag is an error anyway.)

balhoff commented 5 months ago

The OBO serialiser writes idspace tags for a few common prefixes:

@gouttegd those get auto-inserted. I spent a short while trying to prevent it, but maybe I should spend a bit more time trying to figure that out.

gouttegd commented 5 months ago

I think one consequence of those prefix declarations is that seeAlso qualifiers becomes rdfs:seeAlso when an OBO file is written with the new serialiser:

-is_a: UBERON:0005631 {seeAlso="https://github.com/obophenotype/uberon/issues/635"} ! extraembryonic membrane
+is_a: UBERON:0005631 {rdfs:seeAlso="https://github.com/obophenotype/uberon/issues/635"} ! extraembryonic membrane

I don’t think this is necessarily a big deal, but it means that it is another situation where people will need to avoid using different versions of Protégé to edit an ontology that is maintained in the OBO format.

cmungall commented 5 months ago

I may regret saying this but I think we can live with this short term sync headache if it buys us long term predictability

On Thu, Feb 8, 2024 at 3:02 AM Damien Goutte-Gattat < @.***> wrote:

I think one consequence of those prefix declarations is that seeAlso qualifiers become rdfs:seeAlso when round tripping an OBO file:

-is_a: UBERON:0005631 {seeAlso="https://github.com/obophenotype/uberon/issues/635"} ! extraembryonic membrane+is_a: UBERON:0005631 {rdfs:seeAlso="https://github.com/obophenotype/uberon/issues/635"} ! extraembryonic membrane

I don’t think this is necessarily a big deal, but it means that it is another situation where people will need to avoid using different versions of Protégé to edit an ontology that is maintained in the OBO format.

— Reply to this email directly, view it on GitHub https://github.com/owlcs/owlapi/pull/1102#issuecomment-1933837061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOLIGU6ZBFOLTZSS7QLYSSWGFAVCNFSM6AAAAAAYABVVQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZTHAZTOMBWGE . You are receiving this because you were mentioned.Message ID: @.***>

balhoff commented 5 months ago

@gouttegd I just want to note that this string roundtripOBO does not end up with the semweb prefixes injected: https://github.com/owlcs/owlapi/pull/1102/files#diff-9634d65d9f4068b0a8aacb25cc31eb7f7e4a03e43bccd39692a215d55fa43a59R65

So I suppose it is something to do with how ROBOT is handling the prefix manager. I'll try to look at that.

balhoff commented 5 months ago

@gouttegd I pushed a change to prevent the default prefix injection.

balhoff commented 5 months ago

Updated ROBOT build containing these changes: https://github.com/balhoff/owlapi/releases/download/prefixes-test-3/robot.jar

To test:

gouttegd commented 5 months ago

The default prefixes are still injected with the updated build.

Testing a roundtrip with the simple property_value_test.obo file (./robot convert -i property_value_test.obo -o property_value_test2.obo), this yields:

format-version: 1.2
idspace: owl http://www.w3.org/2002/07/owl# 
idspace: rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# 
idspace: rdfs http://www.w3.org/2000/01/rdf-schema# 
idspace: xml http://www.w3.org/XML/1998/namespace 
idspace: xsd http://www.w3.org/2001/XMLSchema# 
ontology: test
property_value: FOO:0000001 http://purl.obolibrary.org/obo/go.owl

[Term]
id: UBERON:0004657
name: mandible condylar process
def: "The posterior process on the ramus of the mandible composed of two parts: a superior part, the articular portion, and an inferior part, the condylar neck." [BTO:0001750, Wikipedia:Mandibular_condyle]
synonym: "condylar process of mandible" EXACT [FMA:52836]
synonym: "mandibular condyle" EXACT [BTO:0001750]
synonym: "mandibular condyloid process" RELATED [MA:0002816]
synonym: "Processus condylaris" EXACT [FMA:52836]
xref: BTO:0001750
xref: FMA:52836
xref: GAID:219
xref: http://upload.wikimedia.org/wikipedia/commons/thumb/b/b1/Processuscondylarismandibulae.PNG/200px-Processuscondylarismandibulae.PNG
xref: MA:0002816
xref: MESH:A.02.835.232.781.324.502.632.600
property_value: IAO:0000412 http://purl.obolibrary.org/obo/uberon.owl
balhoff commented 5 months ago

You're right @gouttegd 😞 I wrote a failing test, then fixed it, but clearly it didn't cover everything. Got a little over eager...

balhoff commented 5 months ago

@gouttegd it was a maven problem rather than a code problem (wasn't using the OWL API I thought I was). I've replaced the robot.jar above.

gouttegd commented 5 months ago

Seems to be working just fine at least on Uberon. :) I’ll do some more tests with other ontologies later.

gouttegd commented 5 months ago

FWIW I didn’t notice any issue with any of the ontologies I work with.

cmungall commented 2 months ago

I have tested this jar and it looks like it injects labels for oio properties. While this is not so harmful and even mildly useful in some circumstances, it's generally undesirable and not supported by the spec.

E.g.

format-version: 1.4
ontology: comment

[Term]
id: X:1
comment: "This is a comment about term X:1."

generates

<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/comment.owl#"
     xml:base="http://purl.obolibrary.org/obo/comment.owl"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/comment.owl">
        <oboInOwl:hasOBOFormatVersion>1.4</oboInOwl:hasOBOFormatVersion>
    </owl:Ontology>

    <!-- 
    ///////////////////////////////////////////////////////////////////////////////////////
    //
    // Annotation properties
    //
    ///////////////////////////////////////////////////////////////////////////////////////
     -->

    <!-- http://www.geneontology.org/formats/oboInOwl#hasOBOFormatVersion -->

    <owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#hasOBOFormatVersion">
        <rdfs:label>has_obo_format_version</rdfs:label>
    </owl:AnnotationProperty>

    <!-- http://www.geneontology.org/formats/oboInOwl#id -->

    <owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#id">
        <rdfs:label>id</rdfs:label>
    </owl:AnnotationProperty>

    <!-- http://www.w3.org/2000/01/rdf-schema#comment -->

    <owl:AnnotationProperty rdf:about="http://www.w3.org/2000/01/rdf-schema#comment"/>

    <!-- 
    ///////////////////////////////////////////////////////////////////////////////////////
    //
    // Classes
    //
    ///////////////////////////////////////////////////////////////////////////////////////
     -->

    <!-- http://purl.obolibrary.org/obo/X_1 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/X_1">
        <oboInOwl:id>X:1</oboInOwl:id>
        <rdfs:comment>&quot;This is a comment about term X:1.&quot;</rdfs:comment>
    </owl:Class>
</rdf:RDF>

<!-- Generated by the OWL API (version 4.5.26) https://github.com/owlcs/owlapi -->
gouttegd commented 2 months ago

@cmungall This does not seem to be caused by this PR. I observe exactly the same behaviour with the current released version of ROBOT (1.9.5).

gouttegd commented 2 months ago

Trying to merge the PR with the tip of the version4 branch results in the RoundTripTestCase#shouldRoundTripVersionInfo() test failing.

The test loads a test ontology in functional syntax, which contains the default prefix declarations (OWL, RDFS, etc.), and converts it to OBO by calling the OWLAPIOwl2Obo.convert() method, which directly translates the prefixes from the source ontology into IDSPACE tags – resulting in the aforementioned default prefix declarations being injected into the OBO header frame, which the test doesn’t expect.

What I absolutely do not understand is why this test only fails upon merging this PR with the version4 branch. From what I understand it should also fail right here, before we even try to merge…

gouttegd commented 2 months ago

Not sure what the correct behaviour of the OWLAPIOwl2Obo.convert() method should be.

For now, it is set to “preserve prefix mappings loaded from a previous serialization”, which automatically results in the typical default prefixes (OWL, RDF, etc.) being converted into IDSPACE tags. Should we actively prevent those prefixes from being converted?

gouttegd commented 2 months ago

What I absolutely do not understand is why this test only fails upon merging this PR with the version4 branch. From what I understand it should also fail right here, before we even try to merge…

So it seems this has to do with the version of the SureFire plugin used to run the test suite.

Jim’s replaced-by-value-as-iri-v4 branch still uses version 2.20 (same version as the one used in OWLAPI 4.5.26). The current version4 branch (upcoming 4.5.27) uses SureFire 3.2.5.

Changing just the version of SureFire in Jim’s branch to use the same 3.2.5 as in the current version4 branch leads to the aforementioned test failure.

ignazio1977 commented 2 months ago

Good to know, cheers.

Now, it's a good question whether the bug is real or in the test. I'll run the same code without the junit machinery and check the results.

I.

On Mon, 6 May 2024, 18:07 Damien Goutte-Gattat, @.***> wrote:

What I absolutely do not understand is why this test only fails upon merging this PR with the version4 branch. From what I understand it should also fail right here, before we even try to merge…

So it seems this has to do with the version of the SureFire plugin used to run the test suite.

Jim’s replaced-by-value-as-iri-v4 branch still uses version 2.20 (same version as the one used in OWLAPI 4.5.26). The current version4 branch (upcoming 4.5.27) uses SureFire 3.2.5.

Changing just the version of SureFire in Jim’s branch to use the same 3.2.5 as in the current version4 branch leads to the aforementioned test failure.

— Reply to this email directly, view it on GitHub https://github.com/owlcs/owlapi/pull/1102#issuecomment-2096404201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAT2AJKC4D64Z53XH2QYUGTZA6TDNAVCNFSM6AAAAAAYABVVQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJWGQYDIMRQGE . You are receiving this because you were mentioned.Message ID: @.***>

ignazio1977 commented 2 months ago

My view:

OWLAPIOwl2Obo.convert()

needs to exclude the standard prefixes, as you do not want them as idspace declarations (please confirm if I've misunderstood this). Implementing a simple list of namespaces to exclude from idispace declarations makes the test pass.

Another failing test is to do with an idspace declaration not being includes (sw in this case). I believe this is because the idspace map built during the ontology parse is not used to put the prefixes in the fresh obo document format object created for the parsing.

I'm experimenting with these two fixes to see if that will allow the tests to pass. Please let me know if you think I'm going the wrong way about this.

gouttegd commented 2 months ago

OWLAPIOwl2Obo.convert() needs to exclude the standard prefixes, as you do not want them as idspace declarations (please confirm if I've misunderstood this).

I’ve come to the same conclusion.

Please let me know if you think I'm going the wrong way about this.

No, I think you’re right. @balhoff what do you think?

ignazio1977 commented 2 months ago

Regarding the failure/non failure due to surefire, I'm thinking the problem might be this file is named Test rather than TestCase and it's Junit 5 (as all other test cases now are), perhaps the old surefire was missing those altogether.

ignazio1977 commented 2 months ago

I got a clean build and have pushed up the commit, any feedback welcome.

I'll check other existing PRs and bug reports, if there are any easy wins I'll add them, then I can release later today or tomorrow

gouttegd commented 2 months ago

I built the latest ROBOT with the current tip of the version4 branch and did some quick tests on my ontologies, it seems to behave as intended.

(FWIW I ran into some Log4J-related issues when building ROBOT against the new OWLAPI, but this has nothing to do with this PR and is almost certainly a problem on the ROBOT side.)

balhoff commented 2 months ago

Thanks @ignazio1977! Where can I see your changes? My intent with the standard prefixes was that I didn't want them automatically injected into a fresh namespace manager, but if they were actually purposely declared then I would include them. Maybe it's an impossible distinction to work out.

ignazio1977 commented 2 months ago

@balhoff I think it would be quite hard to tell apart automatic from standard namespaces, especially on existing ontologies. On freshly created ones the prefix map would be empty, so the presence of namespaces would mean manual insertion, but at renderer level there would be no way of knowing that.

Latest patch set is a squashed commit: https://github.com/owlcs/owlapi/commit/1f4764dc2774f380eb085a33d782680233c42cf0

ignazio1977 commented 2 months ago

I'll check the current libraries for known vulnerabilities and proceed with the release.

ignazio1977 commented 2 months ago

No known vulnerabilities, but quite abit of pain updating GPG keys for the new release. Now I'm stuck with credentials for sonatype, to release to central. My credentials don't seem to work.

I'll try again tomorrow, if no joy I'll raise a ticket with sonatype.

ignazio1977 commented 2 months ago

Release claims to have completed, after two or three unsuccessful attempts.

Maven Central will take a bit to synchronise, I'll check tomorrow.

matentzn commented 2 months ago

Whats the significance of all these "conflicting files" in this PR? Is this PR merged / taken into account for the release?

ignazio1977 commented 2 months ago

Not significant. Part of the PR was already merged, I've rebased and squashed locally before merging the fixed branch, that left GitHub with two conflicting branches.

The PR was merged. The release doesn't seem to have made it to maven central though. I'll go check sonatype.

ignazio1977 commented 2 months ago

Looks like yesterday my attempts managed to publish only one artifact, and today that artifact is stopping the release form continuing.

I've incremented the version to 4.5.28 and released again. This seems to have been successful, I can see the release in sonatype. Not visible yet on maven central, that takes usually a couple of hours.

jamesaoverton commented 2 months ago

I can see 4.5.28 on Maven Central now: https://central.sonatype.com/artifact/net.sourceforge.owlapi/owlapi-api/overview. Thanks!