ROBOT takes 3x longer to reason than Protege?

msinclair2 commented 6 years ago

Using the following code in program to reason an OWLOntology takes ~3 min:

        OWLOntology MSO = ioHelper.loadOntology("MSO_unreasoned.owl");

        OWLReasonerFactory reasonerFactory = new JFactFactory();

        ReasonOperation.reason(MSO, reasonerFactory);

The same file and reasoner takes 1 min in Protege. File is attached (remove .txt extension). I have other files that take longer to reason in Protege and the time multiplication factor in code gets larger the longer in Protege it takes. MSO_unreasoned.owl.txt

jamesaoverton commented 6 years ago

Thanks for the report. I've never done a rigorous comparison of reasoning times between ROBOT and Protege.

Can you tell us the specific versions you're using for ROBOT, Protege, and JFact in Protege? (I think ROBOT has always used JFact 4.0.4.)

I suspect the difference is that ROBOT is asking for a different set of inferences from the reasoner, which takes more time. The best discussion of which inferences to ask for is on issue #273.

@cmungall @matentzn ?

dosumis commented 6 years ago

@msinclair2 - did you take into account loading time? In Protege you're typically reasoning with a pre-loaded ontology. With Robot you are loading and then reasoning.

balhoff commented 6 years ago

My guess is that it's because ROBOT materializes inferred axioms:

https://github.com/ontodev/robot/blob/fd906a0281d34317b7f506542074b9b5c7b762a9/robot-core/src/main/java/org/obolibrary/robot/ReasonOperation.java#L158-L177

This doesn't happen in the basic classification using OWLReasoner.

msinclair2 commented 6 years ago

@dosumis Yes, I took that into account as well as saving out the inferred axioms as asserted.

msinclair2 commented 6 years ago

@jamesaoverton ROBOT: 1.1.0, JFact: 4.0.4, Protege: 5.2.0

msinclair2 commented 6 years ago

@balhoff I'm unlearned about materialization and how that differs from the superclasses the basic OWLReasoner (what Protege uses I guess) infers and then asserts. Could you point me to a somewhat beginner-level resource that explains the difference? That would be immensely helpful to me!

balhoff commented 6 years ago

@msinclair2 I don't know of any resources unfortunately (I will keep this in mind though in case I come across something). But basically you can think of it as that when you first create the reasoner, it performs an initial computation which gets it "ready" to efficiently answer queries. So if you then ask for all the subclasses of a class or class expression it will be pretty quick. The materialization step performs lots of queries so that you can write out inferred conclusions (e.g. what is a subclass of what else). If you look in the OWL API options you can see a whole set of different Inferred Axiom Generators. Some of them can be quite a bit slower than others. For example if you add the one that writes inferred Disjoint Classes axioms, in my experience it may slow things down even worse.

msinclair2 commented 6 years ago

@balhoff Thank you very much, that helps a lot.

msinclair2 commented 6 years ago

I have a related follow-up to this question. I appear to be having problems with redundant subclass axioms. Reasoning in Protege doesn't produce them, but my output from ROBOT does. What is the best way to call the ReasonOperation's removeRedundantSubClassAxioms method from the above code (modified to use HermiT):

        OWLOntology MSO = ioHelper.loadOntology("MSO_unreasoned.owl");

        OWLReasonerFactory reasonerFactory = new ReasonerFactory();

        ReasonOperation.reason(MSO, reasonerFactory);

I tried the following to no avail:

OWLOntology MSO = ioHelper.loadOntology("MSO_unreasoned.owl");

OWLReasonerFactory reasonerFactory = new ReasonerFactory();

OWLReasoner reasoner = reasonerFactory.createReasoner(MSO);

ReasonOperation.reason(MSO, reasonerFactory);

ReasonOperation.removeRedundantSubClassAxioms(reasoner);

msinclair2 commented 6 years ago

I also tried:

OWLOntology MSO = ioHelper.loadOntology("MSO_unreasoned.owl");

OWLReasonerFactory reasonerFactory = new ReasonerFactory();

ReasonOperation.reason(MSO, reasonerFactory);

OWLReasoner reasoner = reasonerFactory.createReasoner(MSO);

ReasonOperation.removeRedundantSubClassAxioms(reasoner);

So that the reasoner would be intialized with the modified ontology, but also no avail.

To show you the problem I'm having, see the image below:

'DNA extent' should just be a child of 'nucleotide extent' not also 'sequence molecular entity extent'. Protege doesn't do this. How can I get ROBOT to also not do this?

cmungall commented 6 years ago

Sorry, I'm not following. Can you state exactly which axiom it is in MSO_unreasoned.owl you expect to be removed?

I'm not following the analogy with Protege as there isn't an equivalent operation in Protege (however, the inferred view in Protege will only show direct inferred subclasses; running reason with the --remove-redundant-subclass-axioms option should assert an equivalent hierarchy)

Note to self: the docs on removing redundant subclass axioms (http://robot.obolibrary.org/reason) could be improved:

--remove-redundant-subclass-axioms: if set to true, any redundant axioms (those that have been asserted and were also inferred) will be removed from the output.

It should say that asserted inferred indirect axioms should be removed.

Aside: nothing whatsoever to do with your question, but something sits oddly with me about the owl definition of DNA extent, will get back to you on this...

mikebada commented 6 years ago

@cmungall Upon opening the unreasoned ontology in Protege, in the uninferred class hierarchy tab, 'DNA extent' is shown to be a child of 'sequence molecular entity extent', which it gets from the first part of its equivalent class axiom. After reasoning (we're currently using HermiT), 'DNA extent' is correctly classified to be a child of 'nucleotide extent' but is not shown to be a child of 'sequence molecular entity extent' in the inferred class hierarchy tab, as that's redundant given that 'sequence molecular entity extent' is a parent of 'nucleotide extent'. Michael is using ROBOT to output a reasoned form of the ontology. Upon opening the ROBOT-outputted reasoned ontology in Protege, 'DNA extent' is correctly shown to be a child of the previously inferred 'nucleotide extent', but it's also shown as a child of 'sequence molecular entity extent', which as stated before is redundant (not to mention confusing, I think). Is there a way to output from ROBOT a version of the ontology as it appears in the inferred class hierarchy tab of Protege? What exactly is the form of the ontology that should be distributed (according to OBO guidelines)?

msinclair2 commented 6 years ago

@cmungall You mention the command to be used on the console. I am using ROBOT core directly in my Java (OWLAPI) project. I stated what I was doing in Java code. Could you give me an answer in that format and not what is done on the command line?

@mikebada 's explanation is very cogent, however just one elaboration: it is not MSO_unreasoned.owl that I am talking about, but MSO.owl. MSO.owl is the output of reasoning by ROBOT in my Java code, not on the command line. The image I presented is what MSO.owl looks like as a result of this reasoning. When the reasoner is run again in Protege as @mikebada described, DNA extent is correctly classified and displayed as being a child only of 'nuceotide extent'. If you load MSO_unreasoned.owl into Protege and run the reasoner, you also get the correct display.

[EDIT: @mikebada my apologies, we said the same thing and there was no need for me to add an explanation.]

@cmungall

I'm not following the analogy with Protege as there isn't an equivalent operation in Protege (however, the inferred view in Protege will only show direct inferred subclasses; running reason with the --remove-redundant-subclass-axioms option should assert an equivalent hierarchy)

It should have been clear from my post that that is exactly what I was trying to do but it isn't working.

msinclair2 commented 6 years ago

@cmungall UPDATE: I tried the command line version:

>> robot reason --reasoner hermit --remove-redundant-subclass-axioms true --input MSO_unreasoned.owl --output MSO_test.owl

and I still have the same problem (i.e. DNA extent is a child of both sequence molecular entity extent and nucleotide extent).

cmungall commented 6 years ago

Can you state precisely what axiom you think should not be there

I do not see a DNA_extent SubClassOf SMEE axiom

msinclair2 commented 6 years ago

I can't, and that's part of the problem. I was hoping you could help us.

@mikebada explicitly noted that too, and that's part of the mystery. He thinks that something is inferring that DNA_extent is a subclass of SMEE from the fact that the first conjunct in the equivalent class definition is SMEE. I thought that maybe it is a Protege display glitch, but this only happens with the output from ROBOT's reason operation.

cmungall commented 6 years ago

Oh, I see. You are talking about the Protege display

this part:

this doesn't have anything to do with robot. In Protege if you have C = G and .. it will treat G as a parent in the asserted display. Once you know it does this it's not an issue. End-users don't use Protege, just the pre-reasoned release version. Many ontology developers have ELK constantly synchronized and use the inferred view (harder for MSO which requires DL reasoning).

We should probably have this in one of our tutorials.. worth bringing up on the Protege tracker (with a simpler example)

mikebada commented 6 years ago

@msinclair2 @cmungall I don't think that this issue happens only with the ROBOT-outputted version. When opening the original master file, from which the MSO and SO are generated, DE appears as a child of both NE and SMEE in the uninferred class hierarchy tab, but only of NE in the inferred class hierarchy tab. We were wondering if there was a way to save the ontology so that the latter version would show up in Protege (without reasoning again), but it sounds like you're saying that this issue isn't really an issue, which is good to hear.

msinclair2 commented 6 years ago

@cmungall Thank you, that is good to know. However, do you know why when reasoned in Protege (using whatever settings it uses for reasoners) this does not happen (treating a conjunct as a parent in its own right)? Some users still rely on Protege to do their browsing no?

@mikebada In my copy of the master file (the latest), DNA extent is only a child of SMEE. When reasoned (in Protege) it appears only as a child of NE, so I'm not able to reproduce your observation.

cmungall commented 6 years ago

On 27 Sep 2018, at 15:23, Michael Sinclair wrote:

@cmungall Thank you, that is good to know. However, do you know why when reasoned in Protege (using whatever settings it uses for reasoners) this does not happen (treating a conjunct as a parent in its own right)?

The protege inferred view hides indirect inferred axioms. Robot reason with redundancy removal achieves the same thing through removal of subClassOf axioms.

Some users still rely on Protege to do their browsing no?

I'm sure some do but there is an implicit contract of being a certain level of expertise. I send most people to either OLS or OntoBee or a specialized database eg amigo/go

mikebada commented 6 years ago

@msinclair2 Yes, I mistyped; it's the outputted MSO in which DE appears as a child of both NE and SMEE in the uninferred class hierarchy tab; for the master, DE is only a child of SMEE.

msinclair2 commented 6 years ago

@cmungall Sorry, you noted that if you have C = G and .. it will treat G as a parent in the asserted view, which is consistent with everything we've observed so far.

This returns me to my original question: what is the correct way to implement removing redundant axioms in the reason operation in code? Is the following code correct?

OWLOntology MSO = ioHelper.loadOntology("MSO_unreasoned.owl");

OWLReasonerFactory reasonerFactory = new ReasonerFactory();

ReasonOperation.reason(MSO, reasonerFactory);

OWLReasoner reasoner = reasonerFactory.createReasoner(MSO);

ReasonOperation.removeRedundantSubClassAxioms(reasoner);

@mikebada I guess Protege will just display the first of the conjuncts as the parent in the asserted view. So I think we don't need to worry further about this and treat it as a Protege design decision we have no control over.

mikebada commented 6 years ago

@msinclair2 Agreed

cmungall commented 6 years ago

This returns me to my original question: what is the correct way to implement removing redundant axioms in the reason operation in code? Is the following code correct? I believe so

msinclair2 commented 6 years ago

Just to add to this, even if you do remove redundant subclassOf axioms using ROBOT, Protege exhibits the same behavior. It will only display correctly (i.e. not display a class as a subclass of the first argument in an equivalent class expression, such as C = G and... therefore display C as a subclass of G) if the reasoner is run from within Protege itself. Smells like a bug.

cmungall commented 6 years ago

I'm sorry, I'm not following.

In Protege, you will see the genus as a direct parent in the asserted view. In the inferred view you may see inferred intermediates. This is true regardless of whether the SubClassOf graph in the asserted ontology has redundancies or not.

msinclair2 commented 6 years ago

I was referring only to the redundancies in Protege's display. Please refer to your own comments above. Protege does this despite there being no axiom in the ontology declaring G a subclass of C. Perhaps my comment itself was redundant and I should delete it, since it confused you unnecessarily and will likely confuse others where no confusion is warranted.

cmungall commented 6 years ago

There is no subclass axiom but there is an OWL definition still, correct?

As I said: " In Protege if you have C = G and .. it will treat G as a parent in the asserted display"

cmungall commented 5 years ago

@msinclair2 are you still having issues? If so, can you provide a URL for a test file and the exact code you are using?

msinclair2 commented 5 years ago

There are two issues. One (1) is that ROBOT is taking longer than Protege to reason. The second (2) is that even in Protege, it's taking a long time.

To reproduce (1):

Step 1a:

Download SO_unreasoned.owl

Download MSO_unreasoned.owl

Open SO_unreasoned.owl in Protege. Add MSO_unreasoned.owl as a direct import. Choose HermiT as the reasoner, and Start Reasoner. Time how long it takes to run to completion.

Step 1b:

Use the following code, and time how long it takes to run to completion:

import org.obolibrary.robot.*;
import org.semanticweb.HermiT.ReasonerFactory;
// And all other necessary imports

        // Instantiate ioHelper from ROBOT and then:
        OWLOntology SO = ioHelper.loadOntology("SO_unreasoned.owl");

        OWLOntologyManager m = SO.getOWLOntologyManager();

        OWLDataFactory df = m.getOWLDataFactory();

        OWLImportsDeclaration importDeclaration = df.getOWLImportsDeclaration(IRI.create
                (new File("MSO_unreasoned.owl")));

        m.applyChange(new AddImport(SO, importDeclaration));

        m.loadOntologyFromOntologyDocument(new File("MSO_unreasoned.owl"));

        OWLReasonerFactory reasonerFactory = new ReasonerFactory();

        // Use ROBOT's ReasonOperation.reason method.
        ReasonOperation.reason(SO, reasonerFactory);

As for issue (2), I ask those with expertise to take a look at the axioms in SO_unreasoned and MSO_unreasoned to see if we are trying to do too much, have too many unnecessary axioms, or whatever else may be causing the long reasoning time, and what, if anything, we can do about it.

cmungall commented 5 years ago

Time to reason in Protege: 5m22s (HermiT 1.3.8.413)

Using robot (which uses the same hermit version):

2018-12-13 15:33:10,371 INFO  org.obolibrary.robot.ReasonerHelper - Checking for unsatisfiable classes...
2018-12-13 15:38:42,180 INFO  org.obolibrary.robot.ReasonerHelper - Checking for unsatisfiable object properties...
...
2018-12-13 15:43:35,721 INFO  org.obolibrary.robot.ReasonOperation - Precomputing class hierarchy...
2018-12-13 15:48:51,131 INFO  org.obolibrary.robot.ReasonOperation - Finding equivalencies...
2018-12-13 15:48:51,178 INFO  org.obolibrary.robot.ReasonOperation - Using these axiom generators:
2018-12-13 15:48:51,178 INFO  org.obolibrary.robot.ReasonOperation -     Subclasses
2018-12-13 15:48:51,178 INFO  org.obolibrary.robot.ReasonOperation - Reasoning took 942 seconds.
2018-12-13 15:48:51,272 INFO  org.obolibrary.robot.ReasonOperation - Reasoning created 5906 new axioms.
2018-12-13 15:48:51,298 INFO  org.obolibrary.robot.ReasonOperation - Ontology has 27749 axioms after all reasoning steps.
2018-12-13 15:48:51,298 INFO  org.obolibrary.robot.ReasonOperation - Filling took 0 seconds.
2018-12-13 15:48:51,300 WARN  org.obolibrary.robot.CommandManager - Subcommand Timing: reason took 947.325 seconds

The actual reasoning and asserting SubClassOf axioms step takes roughly the same time 5m16s - so no discrepancy.

It is unusual why robot takes so long to find unsats compared (5m) with Protege, it should just be checking for what is equivalent to nothing. The OP unsats step is a custom robot thing and this could be made optional. But you aren't doing these steps in your java.

For your issue 2 I suggest taking back to MSO tracker (but you know my opinions here: staying in EL is good for machine reasoning and good for human reasoning).

msinclair2 commented 5 years ago

Did you use the exact code I presented and it took only ~5 minutes? As I said before, it's taking me upwards of 50 minutes. There is something seriously wrong going on here for me.

How about looking at my actual project source code here? I commented out the operations line by line to isolate where the time consuming step is and it is in the ReasonerHelper class, line 70. That is where I excerpted the above code from, and line 70 is the call to ROBOT's ReasonOperation.reason.

Please try downloading the .jar archive and the master file. You can simply run at the command line, with master.owl in the same directory as the .jar:

java -jar MSO-SO_Generator.jar

and please let me know how long this takes.

beckyjackson commented 5 years ago

Using your tool takes about 26 minutes to generate the final product. Using ROBOT (command line) on SO_unreasoned.owl with MSO_unreasoned.owl as a direct import takes ~15 minutes and most of that time was spent checking for unsatisfiable entities as Chris said.

It seems like this task would be more efficient with a Makefile, rather than a custom command line tool. This can just run a series of commands from the shell. You can do all the extracting from master.owl to generate SO and MSO with ROBOT on the command line, as well as reason over it. It will be easier to maintain, as well, as you don't have to build a JAR to run the build process.

Another thing to think about is possibly adding a --validate option to the reason command to skip over the checking for unsats. If I remove that step from the ReasonOperation method, this process takes ~7 minutes. @cmungall what do you think?

msinclair2 commented 5 years ago

Hi @rctauber, thanks for helping me investigate this!

I don't know why the time it takes to reason is so different between your system and mine. I also don't have consistent results when I run it; sometimes it takes longer, sometimes shorter. I have been running Linux as a VM in Windows through VirtualBox. Maybe it's just much less efficient than native Linux?

Using your tool takes about 26 minutes to generate the final product. Using ROBOT (command line) on SO_unreasoned.owl with MSO_unreasoned.owl as a direct import takes ~15 minutes and most of that time was spent checking for unsatisfiable entities as Chris said.

So then the difference of ~11 minutes would be all the other tasks my program does, including reasoning MSO on its own for a final product. This time could also be reduced if the checking for unsats is avoided, I suppose? At least we now know where exactly the time bottleneck is, thank you.

It seems like this task would be more efficient with a Makefile, rather than a custom command line tool. This can just run a series of commands from the shell. You can do all the extracting from master.owl to generate SO and MSO with ROBOT on the command line, as well as reason over it. It will be easier to maintain, as well, as you don't have to build a JAR to run the build process.

I 100% agree. ROBOT couldn't do all the things I needed when I started writing this, now it is almost there. I need it to be able to edit IRIs themselves based on patterns I give (or even regular expressions?). Then I can use it. Do you know if that is on the agenda?

Another thing to think about is possibly adding a --validate option to the reason command to skip over the checking for unsats. If I remove that step from the ReasonOperation method, this process takes ~7 minutes. @cmungall what do you think?

If you do that please let me know, just for the time being, how I can also skip that step when I call ReasonOperation.reason() from within my code, or add that info to the Javadoc, as in the interim I will not yet be using the command line.

cmungall commented 5 years ago

Skipping the unsat heck is very dangerous if you're going to assert inferred subClassOf..

jamesaoverton commented 5 years ago

I wouldn’t be surprised if there’s a significant performance hit from VirtualBox. ROBOT should run fine under Windows, but you won’t have everything you need to run a Makefile. You could compare ROBOT with and without VirtualBox. The amount of RAM allocated to Java can also make a big difference.

jamesaoverton commented 5 years ago

Are you also running Protege inside VirtualBox?

dosumis commented 5 years ago

It may be slightly off-topic, but the judging from your screenshot, the patterns themselves don't look very scalable and I wonder if they are capturing what you really need.

I would expect to see universal restriction combined with existential in a closure pattern, otherwise the genus (sequence molecular entity extent) doesn't have to have any parts (of the specified kind) in order to fulfil the restriction .
Universal restriction doesn't really make sense with a transitive object property. In this case, all parts of parts would need to be of the specified type.
Only, OR and NOT are all outside of EL, so scaling will be worst-case intractable.

Given that this is off-topic for this tracker - I'd be happy to chat further about this on the MSO tracker.

msinclair2 commented 5 years ago

@jamesaoverton

Are you also running Protege inside VirtualBox?

I have done both but I have not kept track of the timing.

Does Protege also check for unsatisfiables? Then this would indicate an actual time difference between the two programs to do the same thing, which gets worse the more axioms there are?

msinclair2 commented 5 years ago

It may be slightly off-topic, but the judging from your screenshot, the patterns themselves don't look very scalable and I wonder if they are capturing what you really need.

I would expect to see universal restriction combined with existential in a closure pattern, otherwise the genus (sequence molecular entity extent) doesn't have to have any parts (of the specified kind) in order to fulfil the restriction . Universal restriction doesn't really make sense with a transitive object property. In this case, all parts of parts would need to be of the specified type. Only, OR and NOT are all outside of EL, so scaling will be worst-case intractable. Given that this is off-topic for this tracker - I'd be happy to chat further about this on the MSO tracker.

We need to bring @mikebada back in on this and I'm going to post this at The-Sequence-Ontology/MSO#9.

mikebada commented 5 years ago

@dosumis The screenshot showing the info for 'DNA extent' shows an old version of the definition. (Note that screenshot's from a few months ago.) As you suggested, definitions for classes such as this no longer use universal restrictions in partonomic assertions; I've been able to re-represent the inferential functionality I was going for using disjointness axioms. There's still a fair amount of non-EL axiomatization throughout the ontology, but I think it's inferentially really useful.

cmungall commented 5 years ago

I'm closing this as we have determined there is a robot issue here. The modeling issue us continued on the MSO tracker

ontodev / robot

ROBOT takes 3x longer to reason than Protege? #368