monarch-initiative / phenol

phenol: Phenotype ontology library
https://phenol.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Use a generic graph library #6

Closed cmungall closed 6 years ago

cmungall commented 6 years ago

ontolib already has good general purpose graph operations. However, there are some issues, e.g. lack of generic edge properties and edge labels. It's a good idea to use a 3rd party lib.

Minimally the 3rd party lib should be on maven central, be well supported and robust and implement the standard features. A good analog is networkx for python, which is very standard.

https://stackoverflow.com/questions/51574/good-java-graph-algorithm-library

Based on this, it seems http://jgrapht.org/ is a good choice (with ability to use Visualization using JGraphX https://github.com/jgraph/jgraphx in javafx programs)

Also consider:

julesjacobsen commented 6 years ago

I agree using a generic graph library will get us maximum utility in the least amount of time and with no required development/testing/bugfixing.

Not a clue which would be best to use - either jgrapht or tinkerpop seem the most reasonable choices. If we consider their functionality equal, What are our requirements? Speed? Memory footprint? API usability? It's worth investigating this on a couple of use-cases to see how they compare, before trying to fit them into our lib.

pnrobinson commented 6 years ago

I guess that the ability to visualize graphs is great -- this would speak for jgrapht, but on the other hand, apache projects are probably more sustainable (the fact that jgrapht is still partially on sourceforge might seem to indicate that it is not very actively maintained?). Jgraph seems to be really nice -- is this from the same group? What about writing a wrapper to use tinkerpop with Jgraph if and when we want to?

yy20716 commented 6 years ago

I briefly checked JGraphT, Neo4j, and Tinkerpop. I guess this would be more preference issue but Apache Tinkerpop seems a reasonable choice for a long term because of its sustainability and interoperability. For example, it has a query language called Gremlin. It is both a graph traversal library and language, which means that we can easily implement graph-based operations. Some common operations are already implemented and released as 'recipes' using Gremlin, e.g., shortest path. This Gremlin query can be also evaluated over TinkerPop-enabled graph systems such as Blazegraph and even neo4j.

One thing I am not sure about this is a graph visualization. It does not have its own visualization core but I see some projects that interlink Gephy (open-source visualization software package written in Java) and Tinkerpop (e.g., https://groups.google.com/forum/#!topic/gremlin-users/LIZ73bxZJcU). Maybe we could build a simple wrapper for JGraphX if we cannot use Gephy.

Any other suggestion would be appreciated.

julesjacobsen commented 6 years ago

JGraphT is actually very much alive and kicking on GitHub, so it is still well supported. I'd go with whichever seems nicest to use, assuming they both have the required algos and all other things being equal.

Depending on how you develop OLP, you should be able to leverage a lot of their API through a layer of your own making so that both libraries might be tested side-by-side for performance on the kinds of things we want to be able to do.

julesjacobsen commented 6 years ago

While I'm here, can I ask we change the name? OLPG sticks out like a sore thumb where repos are usually all lowercase, plus it doesn't roll off the tongue easily. Can I suggest 'phenol' as a replacement? Phenotype Ontology Library? See?

yy20716 commented 6 years ago
pnrobinson commented 6 years ago

How about "Pal" Phenolizing assistance library

cmungall commented 6 years ago

I kind of like OiLPiG. Kind of a tip of the hat to OIL, predecessor of OWL. And pigs are smart.

pnrobinson commented 6 years ago

I am definitely in agreemet we need a better name. phenol and OilPig are both seriously cool...who's got a fair coin?

pnrobinson commented 6 years ago

@yy20716 I would say we will have to develop many wrappers that will use various sorts of graph algorithm like Dykstra or Ford-Fulkerson etc., but that we probably will want to adapt these algorithms to respect things like type of edge or type of term from our target ontologies, so having a library that will let us flexibly work with the basic graph operations would be an advantage. It sounds like the main advantage of JGraphT is visualization, but I suspect that we will not use graph visualization much outside of a web setting, and there me might want to use some javascript framework anyway. Thoughts?

julesjacobsen commented 6 years ago

OiLPiG is awesome!

pnrobinson commented 6 years ago

did not catch, though, what the i's stand for? Should we go for oil pig then or will we come to regret it when we are writing the methods section for staid journals....

cmungall commented 6 years ago

Never take naming advice from me...

pnrobinson commented 6 years ago

I knew you weren't that serious, but I did really like oilpig....:-0 Well, should we stick to the safe side and go for phenol? That works for me...

yy20716 commented 6 years ago
pnrobinson commented 6 years ago

My suggestion is that let's go ahead with an approach based on OWL API and TinkerPop. Also, let's go ahead with "phenol" :-0

  1. I think the argument for OWLAPI is relatively clear -- it would be relatively difficult to extend the ANTLR approach (elegant as it is) to some new ontologies, but OWLAPI has a huge user base and we can user phenol for obo and owl.
  2. I think that our focus (at least for the HPO project) will be on graph algorithms and much-much less on graph visualization with Java (if we do want visualization, we would very probably want it to be on the web and so the connection from Gremlin to D3 to GraphExp might be really useful. Therefore, I think there seems to be an overall advantage for TinkerPop.
  3. While oilpig has lots of charms, cooler heads have prevailed, and phenol is a pretty cool name as well. @cmungall @yy20716 @drseb @julesjacobsen We do not seem to have decision structures for phenol, but since nobody has protested, I am guessing that the above three points are consensus? @yy20716 Should we try to craft a MONDO parser with OWLAPI then? I would like to add MONDO to https://github.com/TheJacksonLaboratory/HPOworkbench and https://github.com/monarch-initiative/hpoannotqc so we can start to work on the merging code, so this would be timely. Once that is working then I would also start on the HPO Parser and add features like getting the pubmed references for the definitions, which we need for the website. thank sPeter
yy20716 commented 6 years ago

I am sorry, but I am a bit confused; do you want to add a mondo.owl loader to HPOworkbench and hpoannotqc, not phenol (OLPG)?

p.s. @cmungall could you please advise about this issue? I am not sure whether leveraging the mappings of Obographs for owl also makes sense to you for HPOworkbench and hpoannotqc. p.s. it would be nice if you could assign separate tickets for these issues and discuss there. Thank you.

cmungall commented 6 years ago
  1. +1
  2. I feel it's a big decision, but I'll defer to those doing the work!
  3. phenol is a cool name without being silly, let's do it. I may have to take oilpig for another project :-)
julesjacobsen commented 6 years ago
  1. +1 Using OWLAPI to load ontologies from owl or obo worked really well in OwlSIm3. Just wrap that and convert the output. Although this might suffer when loading really large graphs as you'd need to hold two in memory for a brief period, unless this can be done in place or a more elegant solution can be found.

  2. I agree that vis isn't necessarily the primary aim here.

  3. phenol works for me - just remember to properly change not only the pom but package declarations too. This will screw everyone's previous code, but its a clean fork with no confusion.

pnrobinson commented 6 years ago

@yy20716 -- sorry for being unclear -- I would like to use phenol as the parser but use the parsed ontology in the other two applications! Thus, we can regard this as a test case.

yy20716 commented 6 years ago
  1. @pnrobinson Thank you for your clarification. However, I am still not sure about mappings for additional fields/annotations in Mondo. @cmungall If you don't mind can you provide any suggestions? Do I need to map all fields in addition to a basic graph structure for Peter's codes?

  2. I am sorry for changing my opinion suddenly but I began to think that JGraphT might be a better option.

    • For a past few days, I was trying to prototype and test TinkerPop but realized that it may not be straightforward to adapt existing codes in Phenol to TinkerPop's graph model.

      • The main reason is that TinkerPop's model is a property graph model that consists of 'untyped' vertices and edges labeled with key-values, while the graph implementation and all algorithm/loader codes in Phenol are more like object-oriented models, i.e., a graph with TermId (vertex) and TermRelation (edge) objects (and other objects such as xrefs exist as well).
      • In other words, all these fields in objects actually need to be decomposed and re-arranged as key-value pairs (i.e., this is similar to object-relational impedance mismatch in database area) which may require a significant modification of phenol.
    • So I believe that we would have two options; we employ

      1. JGraphT: this would allow maximizing re-using codes because JGraphT's interface is very similar to the existing one. If needed, we could build a wrapper that converts phenol's graph instances to the TinkerPop ones for algorithms/datasets that cannot be handled using JGraphT, or

      2. TinkerPop: we would completely transform and refactor all core objects in Phenol to key-value pairs for TinkerPop's graph model. Existing codes such as obo file readers and some HPO-related codes might not be compatible, so we may need to rebuild most of them. This will be ideal but may take some more time.

    • I was not sure how much time was planned for this refactoring process, so I just wanted to be careful before making any major changes. Any suggestion would be appreciated. Thank you.

pnrobinson commented 6 years ago

@yy20716 I would say that we should choose the graph library that we will be most happy with moving forward. At this point in time, there is no code that is using phenol other than about 5 internal projects from my lab. I will happily refactor them. The current use of graph code from ontolib is not what I want to have in application code anyway (because ontolib did not offer what is needed). So I would say we can just build a completely new hierarchy for parsing based on OWLAPI and then join this to a completely new graph library. We can leave the old code in place, but I will quickly refactor our application code and then we can delete the old classes from phenol (After this we should put the code up onto maven central etc). I think the most important thing moving forward will be that we can use basic graph operations to build larger algorithms. We can talk with @cmungall about all of this...any good time?

yy20716 commented 6 years ago

I am really sorry; I accidently made a commit to the develop branch directly while trying to make a commit for codes that use JgraphT. I will remove that commit and re-commit to the separate branch asap. Thank you.

yy20716 commented 6 years ago

Ok, I finished replacing internal graph codes with JGraphT and made an intial commit to the separate branch. I confirmed all tests are currently passed, but observed that some existing testcodes were not sufficiently strict. I will first work on revising test codes today and then make a pull request. I also resolved warning messages such as missting types in some codes. Thank you.

pnrobinson commented 6 years ago

Thanks very much -- this is greatly APPRECIATED!

pnrobinson commented 6 years ago

JGraphT has now been integrated into phenol thanks to @yy20716 Tests are passing and application code (HpoAnnotQc) is working with the new phenol version. I am therefore closing this issue.