monarch-initiative / phenol

phenol: Phenotype ontology library
https://phenol.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

JSON as main parser vs update OWLAPI #228

Closed pnrobinson closed 4 years ago

pnrobinson commented 5 years ago

Phenol is going to be difficult to use in Java 11 applications since OWLAPI seems to have what is known as split packages that do not play nicely with the Java 11 module system. This in turn makes it difficult to update apps such as PhenoteFX to Java 11. But the writing is on the wall and we need a strategy to move to Java 11. I have tried to update phenol to the latest owlapi versions in maven central (going from v4 to v5), but this basically leads to compile errors because org.geneontology.obographs.owlapi.FromOwl.generateGraph still used v4 and the API has been changed. I am wondering if the best approach would be to just switch to JSON as the primary input file for phenol -- we do not need OWLAPI for anything in phenol and so using it to parse OBO is not the best software engineering.

@julesjacobsen @cmungall What do we think about ditching OWLAPI and moving to JSON? This is working nicely in C++ and will almost surely be faster than OBO. Will NCIT be available in JSON as well?

julesjacobsen commented 5 years ago

@pnrobinson can you document exactly what steps you took to do this and what the exact error was? It might be worth opening a ticket with the owlapi people?

We're using OWLAPI so as not to write our own OBO/OWL parser which I think is a reasonable design decision. The OWL/OBO -> JSON transformation requires ontobio, which requires OWLAPI so we're going to have to deal with that problem one way or another.

One downside is that it is quite limiting to require people to use a non-standard representation of an ontology, or is it the case that JSON will become the defacto standard?

pnrobinson commented 5 years ago

@cmungall -- Is the community really planning on moving to JSON? At the moment, AFAIK phenol is being used only by our groups and so we do not need to worry about backwards compatibility and we could endeavour to write the reference implementation of a Java and C++ OBO-Json parser. @julesjacobsen -- With Java modules, you need to require the packages your program needs in a separate file, and I got ~100 errors messages about owlapi when I tried to require phenol. The error messages were I think of the split package type. I will plan on making a separate Java 11 program to test this out in an isolated fashion so that we can understand the problem better. Possibly after vacation though. @cmungall I think that at a minimum we will need to update the obograph library (which uses owlapi 4.5 or so), but the current version is 5.0 -- I do not know if they intend this library to be usable for java 9+

pnrobinson commented 5 years ago

This little program reproduces the same error. https://github.com/pnrobinson/oj Basically, my module file is this:

module org.monarchinitiative.oj {
    requires phenol.core;
    requires phenol.io;
    exports org.monarchinitiative.oj;
}

And the only class is this:

package org.monarchinitiative.oj;

import org.monarchinitiative.phenol.io.OntologyLoader;
import org.monarchinitiative.phenol.ontology.data.Ontology;

import java.io.File;

public class OJ {

    public static void main(String [] args){
        System.out.println("Read the writing on the wall....");
        String fakepath = "fake";
        Ontology ontology = OntologyLoader.loadOntology(new File(fakepath));
    }

}

and one gets this under Java 11: image

cmungall commented 5 years ago

I'm not fully following this but the owlapi is crucial to multiple crucial libraries and applications, can we not focus on finding a fix here? @balhoff do you know anything about this?

balhoff commented 5 years ago

I have used Java 11 without problems, but I don't do anything with the module system (and haven't learned anything about it yet). @pnrobinson what do you need to use modules for? Is this for JavaFX?

pnrobinson commented 5 years ago

@balhoff --afaik the module system is absolutely required to use JavaFX under Java 11. There are also many other advantages of modules. @cmungall afaik we are only using OWLAPI for OBO parsing -- to pass the file onto obographs. It might be a lot of work for the OWLAPI team to upgrade. I will see if I can find any info on their site...this was one recent comment: You need to use Java 8 with Protege 5.5.0. I do not think this will work for us any time soon!

julesjacobsen commented 5 years ago

@pnrobinson I think you're mistaken about the module system being an absolute requirement for building JavaFX in JDK11+. AFAICS you just need to include the javafx-maven-plugin.

No modules here:

https://github.com/openjfx/samples/tree/master/HelloFX/Maven

https://openjfx.io/openjfx-docs/

pnrobinson commented 5 years ago

Possibly. Spent some time trying to get a more complicated example to run and nothing worked until I added modules, but probably there are other workarounds. Still, we will need to upgrade to Java 11 and beyond or risk being relegated with Perl-Fivers.

julesjacobsen commented 5 years ago

True enough, but the module system was introduced in Java 9 and isn't required to use 9+ language features. Unless you want to use modules...

pnrobinson commented 5 years ago

I think this is a very useful language feature that we should actually strive to support. But after vacation haha.

julesjacobsen commented 5 years ago

If you want to use jlink in order to create a standalone application which requires no JDK it seems that even now, the ecosystem isn't ready:

https://stackoverflow.com/questions/56395749/error-automatic-module-cannot-be-used-with-jlink-maven-with-javafx?noredirect=1

I've been trying things out having modularised a project, and using jlink which finally failed on:

Error: automatic module cannot be used with jlink: snakeyaml from ...

See also https://github.com/dsyer/spring-boot-java-10#jlink So it looks like a nice idea and all that, but there are a couple of fatal flaws. :(

pnrobinson commented 4 years ago

We have a lit of topics in this issue. However, I think that we should start to implement a JSON parser for HPO with Jackson.

julesjacobsen commented 4 years ago

We don't need to explicitly do this here - we just need this issue fixing and it will all magically work.

The OWLAPI stuff is not related to the JSON parsing in this case.

pnrobinson commented 4 years ago

We will need to make classes the mirror the structure of the JSON file and then just need to add these annotations right? The classes would then have some factory for making our current Ontology?

julesjacobsen commented 4 years ago

We could do this - however it could result in awkward namespace clashes and we'd be duplicating work which really ought to happen in the underling library. This isn't a third-party library, it's one of ours so we really ought to fix the root cause. I've done a bit of work towards doing this and can commit my code but haven't thoroughly tested it.

pnrobinson commented 4 years ago

Not sure I understand -- none of the current code is similar to the json files, and so we would need to have that in order to use something like Jackson, correct? But if you have some code, then yes please commit and we can iterate as needed!

julesjacobsen commented 4 years ago

phenol depends on obographs to read the ontology.json files as these are actually produced by obographs via robot. The ontology.json file is the internal obographs model of the ontology produced via Jackson.

The problem is that while Jackson knows how to serialise the data to JSON, it is lacking the annotations in obographs to de-serialise so it can happily write it out, but not read it back in again.

If we wrote our own set of classes to do this we'd mostly just be reverse-engineering the obographs model. Given we're the maintainers of obographs, we should simply fix obographs!

There will still be a dependency on the OWLAPI, but we can work to separate that out so we can have a smaller, faster phenol for non-owl use. Nonetheless I think we'll still find some use-cases where we still want owl such as in issue #254.

julesjacobsen commented 4 years ago

@pnrobinson I've now fixed the obographs issue (https://github.com/julesjacobsen/obographs/tree/json-deserialise) and have uncovered a loading issue in phenol which I'll look at fixing, but until the above has been merged and releases we won't be able to ingest json.

The good news is that using this build locally I can read in hp.json and it is much faster, as expected. Roughly the performance is this:

hp.owl  - 17s
hp.obo  - 9s
hp.json - 0.6s

So, looking good.

julesjacobsen commented 4 years ago

Loading from JSON is now fixed. Not sure if this issue is still relevant! Should this be a Java modules issue?.

pnrobinson commented 4 years ago

Very nice! I would say this issue is closable. Let's think about modularization separate ly. This would have lots of implications for some of the PhenoteFX software, but I think we need to take the plunge soon!