owlcollab / owltools

OWLTools
BSD 3-Clause "New" or "Revised" License
108 stars 33 forks source link

Correct label and biological process issues for model annotation loader, possible other improvements #253

Open kltm opened 6 years ago

kltm commented 6 years ago

Looking at the output of a loader run: http://tomodachi.berkeleybop.org/amigo/search/model_annotation

The values in the return documents that drive this interface are like:

      {
        "document_category":"model_annotation",
        "id":"gomodel:581e072c00000062|http://model.geneontology.org/581e072c00000062/581e072c00000098|FB:FBgn0005672|GO:0048019|",
        "annotation_unit":"gomodel:581e072c00000062|http://model.geneontology.org/581e072c00000062/581e072c00000098|FB:FBgn0005672|GO:0048019|",
        "annotation_unit_label":"gomodel:581e072c00000062 spi Dmel GO:0048019",
        "annotation_unit_label_searchable":"gomodel:581e072c00000062 spi Dmel GO:0048019",
        "model":"gomodel:581e072c00000062",
        "model_label":"EGFR_SPI_DMEL",
        "model_label_searchable":"EGFR_SPI_DMEL",
        "model_label_searchable":"EGFR_SPI_DMEL",
        "model_url":"http://noctua.berkeleybop.org/editor/graph/581e072c00000062.ttl",
        "model_state":"development",
        "model_date":"2016-11-07",
        "model_date_searchable":"2016-11-07",
        "model_date_searchable":"2016-11-07",
        "enabled_by":"FB:FBgn0005672",
        "enabled_by_searchable":"FB:FBgn0005672",
        "enabled_by_label":"spi Dmel",
        "enabled_by_label_searchable":"spi Dmel",
        "enabled_by_label_searchable":"spi Dmel",
        "function_class":"GO:0048019",
        "function_class_label":"GO:0048019",
        "function_class_label_searchable":"GO:0048019",
        "function_class_label_searchable":"GO:0048019",
        "function_class_closure_map":"{GO:0048019=null}",
        "location_list_closure_map":"{GO:0005615=null}",
        "owl_blob_json":"JSON_BLOB_REMOVED_FOR_BREVITY",
        "evidence_type":"ECO:0000315",
        "evidence_type_label":"ECO:0000315",
        "evidence_type_label_searchable":"ECO:0000315",
        "evidence_type_label_searchable":"ECO:0000315",
        "evidence_type_closure_map":"{ECO:0000315=null}",
        "reference":["PMID:7833286","PMID:7833286\t"],
        "contributor":["http://orcid.org/0000-0003-3212-6364"],
        "evidence_type_closure":["ECO:0000315"],
        "reference_searchable":[
          "PMID:7833286",
          "PMID:7833286\t",
          "PMID:7833286",
          "PMID:7833286\t"],
        "contributor_searchable":[
          "http://orcid.org/0000-0003-3212-6364",
          "http://orcid.org/0000-0003-3212-6364"],
        "location_list_closure":["GO:0005615"],
        "function_class_closure":["GO:0048019"],
        "location_list":["GO:0005615"],
        "location_list_label":["GO:0005615"],
        "score":1.0
}

There are several oddities in this load:

There are also a couple of issues that should be coordinated with other development, such as either switching to names for contributor (or making an id/label map), adding groups, etc.

While these may all be separate tickets, the tooling to get at the issues is similar for starters.

kltm commented 6 years ago

The used command line is:

java -Xms1024M -DentityExpansionLimit=4086000 -Djava.awt.headless=true -Xmx192G -jar ./java/lib/owltools-runner-all.jar http://purl.obolibrary.org/obo/go/extensions/go-gaf.owl http://purl.obolibrary.org/obo/eco.owl http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl/cl-basic.owl http://purl.obolibrary.org/obo/go/extensions/gorel.owl http://purl.obolibrary.org/obo/pato.owl http://purl.obolibrary.org/obo/po.owl http://purl.obolibrary.org/obo/chebi.owl http://purl.obolibrary.org/obo/uberon/basic.owl http://purl.obolibrary.org/obo/wbbt.owl http://purl.obolibrary.org/obo/go/extensions/go-modules-annotations.owl http://purl.obolibrary.org/obo/go/extensions/go-taxon-subsets.owl --log-info --merge-support-ontologies --merge-imports-closure --remove-subset-entities upperlevel --remove-disjoints --silence-elk --reasoner elk --solr-taxon-subset-name amigo_grouping_subset --solr-eco-subset-name go_groupings --remove-equivalent-to-nothing-axioms --solr-url http://localhost:8080/solr/ --solr-log /tmp/golr_timestamp.log --read-model-folder /home/bbop/local/src/git/noctua-models/models/ --read-model-url-prefix http://noctua.berkeleybop.org/editor/graph/ --solr-load-models

This requires a checkout of geneontology/noctua-models; I suspect that --solr-url should be set to mock to prevent actually attempting a load against a server.

kltm commented 6 years ago

Tagging @yy20716

yy20716 commented 6 years ago

I began checking one of the issues, i.e. the field process_class is not generated. I speculate that this issue is also related to the namespace issue. Here is the trace:

@cmungall I think that this happens because the input graph (e.g. go-gaf.owl?) does not include namespace tags anymore. This would work again if we could disable or revise the logic that checks namespace, but please let me know if you have any other suggestions. Thank you.

cmungall commented 6 years ago

This is familiar, I think you found the exact issue earlier when you were debugging the prediction code.

See https://github.com/owlcollab/owltools/pull/239

cmungall commented 6 years ago

bpSet = reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))

yy20716 commented 6 years ago

@cmungall I tried your approach but that one does not work because getSubClass does not accept IRI object.

reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))

So I tried this way:

OWLClass c = graph.getManager().getOWLDataFactory().getOWLClass(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"));

The problem is that the ontologies do not have the term GO_0008150 inside, e.g.,

System.out.println("c: " + c + " num: " + model.getDeclarationAxioms(c).size());

The above code prints

c: <http://purl.obolibrary.org/obo/GO_0008150> num: 0

i.e., no such instances in ontology. Because of that, the reasoner just returns owl:Nothing, thus the bpSet is still null. I guess that's why the previous codes refers to the namespace tag (instead of depending on the reasoner). Can you please advise what to do in this situation? I first thought that we could make and use the separate reasoner instance that loaded the full gene ontology, but maybe this is a way overkill.

cmungall commented 6 years ago

On 9 May 2018, at 18:15, HyeongSik Kim wrote:

Chris, I tried your approach but that one does not work because getSubClass does not accept IRI object.

reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))

So I tried this way:

OWLClass c = 
graph.getManager().getOWLDataFactory().getOWLClass(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"));

looks good

The problem is that the ontologies do not have the term GO_0008150 inside, e.g.,

System.out.println("c: " + c + " num: " + 
model.getDeclarationAxioms(c).size());

The above code prints

c: <http://purl.obolibrary.org/obo/GO_0008150> num: 0

we don't expect the declarations to be in the model - they should be in the import chain, and the reasoner uses the full import chain

i.e., no such instances in ontology. Because of that, the reasoner just returns owl:Nothing, thus the bpSet is still null. I guess that's why the previous codes refers to the namespace tag (instead of depending on the reasoner). Can you please advise what to do in this situation? I first thought that we could make and use the separate reasoner instance that loaded the full gene ontology, but maybe this is a way overkill.

is this using a test ontology?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/owlcollab/owltools/issues/253#issuecomment-387921119

yy20716 commented 6 years ago
yy20716 commented 6 years ago

@balhoff it's a bit late to ask this question to you, but we wonder whether you have experienced any similar issues before. If you have, could you please share your experience and approach? We thought that similar issues could arise in the minerva but maybe it was not that obvious like this case. Thank you.

p.s. as requested, I will see if I can reproduce this issue using simple, dummy ontologies.

cmungall commented 6 years ago

@yy20716 thank you for the detailed analysis and write-up. @ignazio1977 is this something to bring up on the owlapi tracker (ignore the bulk of this ticket, just look at the comment above this one)

ignazio1977 commented 6 years ago

@cmungall Yes it sounds like an issue for OWLAPI. Not sure how this happens, as managers do not refer axioms directly - I'd guess there must be an ontology that doesn't get cleared when emptying the manager. An example, or a heap dump of a manager with the noctua-models loaded, would help.

yy20716 commented 6 years ago

Thank you so much for your patience and understanding.

yy20716 commented 6 years ago

Here is the repo that contains the testcodes I built: https://github.com/yy20716/owlapi-repeat-model-load

cmungall commented 6 years ago

Thanks @yy20716! We can continue it from here.

@ignazio1977 we'll let you know if we can reproduce this purely in the context of the OWLAPI

ignazio1977 commented 6 years ago

Reading the whole issue, one thing comes to mind: some reasoners register themselves as ontology change listeners with the ontology manager for their root ontology. Letting the reasoner variable go out of scope does not remove all references to the reasoner (the reasoner is still referenced by the ontology manager in its listener collection). In turn, a reasoner would hold references to ontologies and possibly axiom lists.

Usually a reasoner should unregister itself when the dispose() method is called. You might want to verify if this is the issue here by checking the list of ontology change listeners associated with the manager that appears to be leaking memory.

What I'd do is to ensure dispose() is called when I'm done with a reasoner and ensure the list of listeners does not increase in size once I've unloaded the abox that I no longer need to use. If the list is increasing in size, there must be extra listeners not being removed - that would provide a clue as to what is holding onto objects that are no longer needed.

Dropping the manager altogether and working with a new manager - i.e., the workaround you already have in place - would fit with this analysis, as this would effectively also drop all the listeners attached to the old manager. Cleaning the listeners would be much faster than copying the ontologies, though.

fanavarro commented 4 years ago

Hi everyone. I found an strange behavior when copying ontologies with imports and I think it could be related with the issues you are discussing here. I've open an issue in owlapi if you are interested in.

Greetings!