Open kltm opened 6 years ago
The used command line is:
java -Xms1024M -DentityExpansionLimit=4086000 -Djava.awt.headless=true -Xmx192G -jar ./java/lib/owltools-runner-all.jar http://purl.obolibrary.org/obo/go/extensions/go-gaf.owl http://purl.obolibrary.org/obo/eco.owl http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl/cl-basic.owl http://purl.obolibrary.org/obo/go/extensions/gorel.owl http://purl.obolibrary.org/obo/pato.owl http://purl.obolibrary.org/obo/po.owl http://purl.obolibrary.org/obo/chebi.owl http://purl.obolibrary.org/obo/uberon/basic.owl http://purl.obolibrary.org/obo/wbbt.owl http://purl.obolibrary.org/obo/go/extensions/go-modules-annotations.owl http://purl.obolibrary.org/obo/go/extensions/go-taxon-subsets.owl --log-info --merge-support-ontologies --merge-imports-closure --remove-subset-entities upperlevel --remove-disjoints --silence-elk --reasoner elk --solr-taxon-subset-name amigo_grouping_subset --solr-eco-subset-name go_groupings --remove-equivalent-to-nothing-axioms --solr-url http://localhost:8080/solr/ --solr-log /tmp/golr_timestamp.log --read-model-folder /home/bbop/local/src/git/noctua-models/models/ --read-model-url-prefix http://noctua.berkeleybop.org/editor/graph/ --solr-load-models
This requires a checkout of geneontology/noctua-models
; I suspect that --solr-url
should be set to mock
to prevent actually attempting a load against a server.
Tagging @yy20716
I began checking one of the issues, i.e. the field process_class
is not generated. I speculate that this issue is also related to the namespace issue. Here is the trace:
process
map is generated by checking whether bpSet.contains(bpType)
from this part but the bpSet
is always empty. bpSet
is (supposed to be) filled up by the method getAspect which uses graph.getNamespace, but it always return the null right now.@cmungall I think that this happens because the input graph (e.g. go-gaf.owl?) does not include namespace tags anymore. This would work again if we could disable or revise the logic that checks namespace, but please let me know if you have any other suggestions. Thank you.
This is familiar, I think you found the exact issue earlier when you were debugging the prediction code.
bpSet = reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))
@cmungall I tried your approach but that one does not work because getSubClass does not accept IRI object.
reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))
So I tried this way:
OWLClass c = graph.getManager().getOWLDataFactory().getOWLClass(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"));
The problem is that the ontologies do not have the term GO_0008150 inside, e.g.,
System.out.println("c: " + c + " num: " + model.getDeclarationAxioms(c).size());
The above code prints
c: <http://purl.obolibrary.org/obo/GO_0008150> num: 0
i.e., no such instances in ontology. Because of that, the reasoner just returns owl:Nothing
, thus the bpSet
is still null. I guess that's why the previous codes refers to the namespace tag (instead of depending on the reasoner). Can you please advise what to do in this situation? I first thought that we could make and use the separate reasoner instance that loaded the full gene ontology, but maybe this is a way overkill.
On 9 May 2018, at 18:15, HyeongSik Kim wrote:
Chris, I tried your approach but that one does not work because getSubClass does not accept IRI object.
reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))
So I tried this way:
OWLClass c = graph.getManager().getOWLDataFactory().getOWLClass(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"));
looks good
The problem is that the ontologies do not have the term GO_0008150 inside, e.g.,
System.out.println("c: " + c + " num: " + model.getDeclarationAxioms(c).size());
The above code prints
c: <http://purl.obolibrary.org/obo/GO_0008150> num: 0
we don't expect the declarations to be in the model - they should be in the import chain, and the reasoner uses the full import chain
i.e., no such instances in ontology. Because of that, the reasoner just returns
owl:Nothing
, thus thebpSet
is still null. I guess that's why the previous codes refers to the namespace tag (instead of depending on the reasoner). Can you please advise what to do in this situation? I first thought that we could make and use the separate reasoner instance that loaded the full gene ontology, but maybe this is a way overkill.
is this using a test ontology?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/owlcollab/owltools/issues/253#issuecomment-387921119
Uh, so it turns out that I didn't deep-copy all ontologies but only did the ones in noctua-model, which is why the reasoning was not happening properly. I forgot the fact that the single OWLOntologyManager
can contains multiple ontologies inside.
I revised the codes so that all loaded ontologies can be deep-copied. I now see that labels are properly generated but we do still not experience memory leaks. I will soon make a pull request that contains revised codes. Thank you so much for your patience.
FYI. @cmungall, unfortunately I was not able to find the post I was talking about (it's possible that my memory is not correct or it was just my imagination). However, I got to know the usage of the copy method from this post, i.e. OWLOntologyManager.copyOntology does not copy imports which complains that deep-copying ontology does not process import statements (it is now solved). It's possible that that user would use the copyOntology method for a different purpose, but I thought that this method could be used to address our issue here.
Let me summarize the problem and the solution for you (and possibly others):
Using reasoners via owlapi (particularly elk) sometimes causes inferred or referred axioms remain in the ontology manager even after we call flush and dispose methods in owlapi, i.e. GC does or cannot collect memories used for such axioms even if they are not used anymore for some reasons. This is not noticeable or not a significant issue if we run the reasoner only one time or a few times, i.e. once the ontology manager is destroyed, all linked axioms and ontologies are automatically removed by GC (and therefore no memory leaks happen).
However, the situation here may be a bit unique, i.e., we do not destroy the ontology manager but keep using it to avoid repeated loading of tboxes. In other words, we import tbox only once (e.g., go-gaf.owl) and do reasoning over ttl files in noctua-models for generating solr indexes.
It's possible that there are different ways to clean up such axioms, but my observation was that such axioms are only released once the connected ontology manager object is destroyed. However, the challenge here is that we don't want to destroy this manager because we then have to import all tbox statements again, which takes a lot of time. It's possible that this bug comes from the elk but I was not fully sure about this.
One workaround I found is to deep-copy ontology instances into a temporary manager and destory the copied one once we finish using it. An overall flow can be summarized as follow:
As usual, we first import all axioms in tbox to the ontology manager
for each reasoning session, we do not directly create a reasoner over the tbox but
Please note that this deep-copying method is available at OWLAPI, i.e., the ontology copy will create a new OWLOntology instance with same ontology annotations, same ID, and same axioms. Format, document IRI will also be copied over. This process does not take much time and memory because it is in-memory operation. This cannot be achieved using other external libraries such as Kyro or Google's cloning library, i.e. reasoners are not serializable.
@balhoff it's a bit late to ask this question to you, but we wonder whether you have experienced any similar issues before. If you have, could you please share your experience and approach? We thought that similar issues could arise in the minerva but maybe it was not that obvious like this case. Thank you.
p.s. as requested, I will see if I can reproduce this issue using simple, dummy ontologies.
@yy20716 thank you for the detailed analysis and write-up. @ignazio1977 is this something to bring up on the owlapi tracker (ignore the bulk of this ticket, just look at the comment above this one)
@cmungall Yes it sounds like an issue for OWLAPI. Not sure how this happens, as managers do not refer axioms directly - I'd guess there must be an ontology that doesn't get cleared when emptying the manager. An example, or a heap dump of a manager with the noctua-models loaded, would help.
I began to think that this might not be the issue from OWLAPI but something deep in owltools causes them. I was working on making an example case with the same loader codes but I was not able to reproduce this issue, i.e. no memory leaks, while I still see the memory leak from the old owltools' codes. I believe that the cultprits would be OWLGraphWrapper* classes but I am still not sure about the root of the causes yet. But anyway I will continue trying before today and upload the test codes soon.
In my opinion, deep-copying ontologies with a temporary manager would still be a good workaround to avoid the memory leaks. If any other memory leaks would be observed again in the owltools (or other ones such as minerva), the same workaround could be applied for a while. However, my suggestion is that eventually we would need to simplify codes and build a minimal, dedicated loader, so that we can easily track the issue and prevent any similar issues in the future.
Thank you so much for your patience and understanding.
Here is the repo that contains the testcodes I built: https://github.com/yy20716/owlapi-repeat-model-load
Thanks @yy20716! We can continue it from here.
@ignazio1977 we'll let you know if we can reproduce this purely in the context of the OWLAPI
Reading the whole issue, one thing comes to mind: some reasoners register themselves as ontology change listeners with the ontology manager for their root ontology. Letting the reasoner variable go out of scope does not remove all references to the reasoner (the reasoner is still referenced by the ontology manager in its listener collection). In turn, a reasoner would hold references to ontologies and possibly axiom lists.
Usually a reasoner should unregister itself when the dispose()
method is called. You might want to verify if this is the issue here by checking the list of ontology change listeners associated with the manager that appears to be leaking memory.
What I'd do is to ensure dispose()
is called when I'm done with a reasoner and ensure the list of listeners does not increase in size once I've unloaded the abox that I no longer need to use. If the list is increasing in size, there must be extra listeners not being removed - that would provide a clue as to what is holding onto objects that are no longer needed.
Dropping the manager altogether and working with a new manager - i.e., the workaround you already have in place - would fit with this analysis, as this would effectively also drop all the listeners attached to the old manager. Cleaning the listeners would be much faster than copying the ontologies, though.
Hi everyone. I found an strange behavior when copying ontologies with imports and I think it could be related with the issues you are discussing here. I've open an issue in owlapi if you are interested in.
Greetings!
Looking at the output of a loader run: http://tomodachi.berkeleybop.org/amigo/search/model_annotation
The values in the return documents that drive this interface are like:
There are several oddities in this load:
id
andannotation_unit
are generated as a pipe-separated combination of model id, M, P, and C; it seems that one of those is missing as there is a trailing pipe_searchable
fields seem to be repeats; this may not be a "real" issue, practically speakingreference
andreference_searchable
seem to contain tabs; they should not_closure_map
fields contain a "null", which would seem to indicate some issue with JSON deserialization or similar; the form should be id:label; note that this could merely be that there is something wrong with labels (see below)_label
*, in almost all cases, there are no labels, just ids; it could be that the ontology information to get label is not loading properly as part of the command or something deeperprocess_class
, just as there isfunction_class
; no process fields existThere are also a couple of issues that should be coordinated with other development, such as either switching to names for contributor (or making an id/label map), adding groups, etc.
While these may all be separate tickets, the tooling to get at the issues is similar for starters.