Closed cmungall closed 6 years ago
I agree using a generic graph library will get us maximum utility in the least amount of time and with no required development/testing/bugfixing.
Not a clue which would be best to use - either jgrapht or tinkerpop seem the most reasonable choices. If we consider their functionality equal, What are our requirements? Speed? Memory footprint? API usability? It's worth investigating this on a couple of use-cases to see how they compare, before trying to fit them into our lib.
I guess that the ability to visualize graphs is great -- this would speak for jgrapht, but on the other hand, apache projects are probably more sustainable (the fact that jgrapht is still partially on sourceforge might seem to indicate that it is not very actively maintained?). Jgraph seems to be really nice -- is this from the same group? What about writing a wrapper to use tinkerpop with Jgraph if and when we want to?
I briefly checked JGraphT, Neo4j, and Tinkerpop. I guess this would be more preference issue but Apache Tinkerpop seems a reasonable choice for a long term because of its sustainability and interoperability. For example, it has a query language called Gremlin. It is both a graph traversal library and language, which means that we can easily implement graph-based operations. Some common operations are already implemented and released as 'recipes' using Gremlin, e.g., shortest path. This Gremlin query can be also evaluated over TinkerPop-enabled graph systems such as Blazegraph and even neo4j.
One thing I am not sure about this is a graph visualization. It does not have its own visualization core but I see some projects that interlink Gephy (open-source visualization software package written in Java) and Tinkerpop (e.g., https://groups.google.com/forum/#!topic/gremlin-users/LIZ73bxZJcU). Maybe we could build a simple wrapper for JGraphX if we cannot use Gephy.
Any other suggestion would be appreciated.
JGraphT is actually very much alive and kicking on GitHub, so it is still well supported. I'd go with whichever seems nicest to use, assuming they both have the required algos and all other things being equal.
Depending on how you develop OLP, you should be able to leverage a lot of their API through a layer of your own making so that both libraries might be tested side-by-side for performance on the kinds of things we want to be able to do.
While I'm here, can I ask we change the name? OLPG sticks out like a sore thumb where repos are usually all lowercase, plus it doesn't roll off the tongue easily. Can I suggest 'phenol' as a replacement? Phenotype Ontology Library? See?
I agree with Jules about the name of this repository. We could have a better name than OLPG. Phenol sounds good to me.
I spent some more time to check and compare JGraphT and Tinkerpop. In the end, I felt that it's like a comparison between apple and orange but here is my preliminary opinion.
JGraphT:
TinkerPop:
Note: both of them are already available at maven repository. They also do not have any mappers to OWLAPI/Obographs, so we would need to implement import/export layers for them. The JGraphT and TinkerPop graph can be indrectly mapped via GraphML format if needed.
How about "Pal" Phenolizing assistance library
I kind of like OiLPiG. Kind of a tip of the hat to OIL, predecessor of OWL. And pigs are smart.
I am definitely in agreemet we need a better name. phenol and OilPig are both seriously cool...who's got a fair coin?
@yy20716 I would say we will have to develop many wrappers that will use various sorts of graph algorithm like Dykstra or Ford-Fulkerson etc., but that we probably will want to adapt these algorithms to respect things like type of edge or type of term from our target ontologies, so having a library that will let us flexibly work with the basic graph operations would be an advantage. It sounds like the main advantage of JGraphT is visualization, but I suspect that we will not use graph visualization much outside of a web setting, and there me might want to use some javascript framework anyway. Thoughts?
OiLPiG is awesome!
did not catch, though, what the i's stand for? Should we go for oil pig then or will we come to regret it when we are writing the methods section for staid journals....
Never take naming advice from me...
I knew you weren't that serious, but I did really like oilpig....:-0 Well, should we stick to the safe side and go for phenol? That works for me...
I was trying to find whether there are any performance comparisons between JGraphT and Tinkerpop but most reports were very preliminary or out-dated while those two projects are rapidly updated. Tinkerpop's performance also could vary based on the implementation we would use. I am trying to be cautious but we could begin with OWLAPI and Tinkerpop if everyone is okay with these. It would be also nice if we could build/test some testcases and continue discussing this issue again.
I guess visualization is not a primary goal here but there's a project called GraphExp, which displays a graph stored in a Gremlin graph database using D3.js. Other libraries also exist (See also visualization section: https://github.com/JanusGraph/janusgraph). We maybe later fork and extend them if needed. Any other suggestions would be appreciated.
My suggestion is that let's go ahead with an approach based on OWL API and TinkerPop. Also, let's go ahead with "phenol" :-0
oilpig
has lots of charms, cooler heads have prevailed, and phenol
is a pretty cool name as well.
@cmungall @yy20716 @drseb @julesjacobsen We do not seem to have decision structures for phenol, but since nobody has protested, I am guessing that the above three points are consensus?
@yy20716 Should we try to craft a MONDO parser with OWLAPI then? I would like to add MONDO to
https://github.com/TheJacksonLaboratory/HPOworkbench
and
https://github.com/monarch-initiative/hpoannotqc
so we can start to work on the merging code, so this would be timely. Once that is working then I would also start on the HPO Parser and add features like getting the pubmed references for the definitions, which we need for the website.
thank sPeterI am sorry, but I am a bit confused; do you want to add a mondo.owl loader to HPOworkbench
and hpoannotqc
, not phenol
(OLPG
)?
Obographs
(instead of using OWLAPI
directly). One issue is that these codes do not load any additional annotations, property chains, or domain/range axioms, so I guess we need to first consider whether any additional special fields/properties/values need to be mapped for mondo. phenol
. Is this related or the idea was abandoned? It would be helpful if you could clarify these questions. p.s. @cmungall could you please advise about this issue? I am not sure whether leveraging the mappings of Obographs for owl also makes sense to you for HPOworkbench
and hpoannotqc
.
p.s. it would be nice if you could assign separate tickets for these issues and discuss there. Thank you.
+1 Using OWLAPI to load ontologies from owl or obo worked really well in OwlSIm3. Just wrap that and convert the output. Although this might suffer when loading really large graphs as you'd need to hold two in memory for a brief period, unless this can be done in place or a more elegant solution can be found.
I agree that vis isn't necessarily the primary aim here.
phenol works for me - just remember to properly change not only the pom but package declarations too. This will screw everyone's previous code, but its a clean fork with no confusion.
@yy20716 -- sorry for being unclear -- I would like to use phenol as the parser but use the parsed ontology in the other two applications! Thus, we can regard this as a test case.
@pnrobinson Thank you for your clarification. However, I am still not sure about mappings for additional fields/annotations in Mondo. @cmungall If you don't mind can you provide any suggestions? Do I need to map all fields in addition to a basic graph structure for Peter's codes?
I am sorry for changing my opinion suddenly but I began to think that JGraphT might be a better option.
For a past few days, I was trying to prototype and test TinkerPop but realized that it may not be straightforward to adapt existing codes in Phenol to TinkerPop's graph model.
So I believe that we would have two options; we employ
JGraphT: this would allow maximizing re-using codes because JGraphT's interface is very similar to the existing one. If needed, we could build a wrapper that converts phenol's graph instances to the TinkerPop ones for algorithms/datasets that cannot be handled using JGraphT, or
TinkerPop: we would completely transform and refactor all core objects in Phenol to key-value pairs for TinkerPop's graph model. Existing codes such as obo file readers and some HPO-related codes might not be compatible, so we may need to rebuild most of them. This will be ideal but may take some more time.
I was not sure how much time was planned for this refactoring process, so I just wanted to be careful before making any major changes. Any suggestion would be appreciated. Thank you.
@yy20716 I would say that we should choose the graph library that we will be most happy with moving forward. At this point in time, there is no code that is using phenol other than about 5 internal projects from my lab. I will happily refactor them. The current use of graph code from ontolib is not what I want to have in application code anyway (because ontolib did not offer what is needed). So I would say we can just build a completely new hierarchy for parsing based on OWLAPI and then join this to a completely new graph library. We can leave the old code in place, but I will quickly refactor our application code and then we can delete the old classes from phenol (After this we should put the code up onto maven central etc). I think the most important thing moving forward will be that we can use basic graph operations to build larger algorithms. We can talk with @cmungall about all of this...any good time?
I am really sorry; I accidently made a commit to the develop branch directly while trying to make a commit for codes that use JgraphT. I will remove that commit and re-commit to the separate branch asap. Thank you.
Ok, I finished replacing internal graph codes with JGraphT and made an intial commit to the separate branch. I confirmed all tests are currently passed, but observed that some existing testcodes were not sufficiently strict. I will first work on revising test codes today and then make a pull request. I also resolved warning messages such as missting types in some codes. Thank you.
Thanks very much -- this is greatly APPRECIATED!
JGraphT has now been integrated into phenol thanks to @yy20716 Tests are passing and application code (HpoAnnotQc) is working with the new phenol version. I am therefore closing this issue.
ontolib already has good general purpose graph operations. However, there are some issues, e.g. lack of generic edge properties and edge labels. It's a good idea to use a 3rd party lib.
Minimally the 3rd party lib should be on maven central, be well supported and robust and implement the standard features. A good analog is networkx for python, which is very standard.
https://stackoverflow.com/questions/51574/good-java-graph-algorithm-library
Based on this, it seems http://jgrapht.org/ is a good choice (with ability to use Visualization using JGraphX https://github.com/jgraph/jgraphx in javafx programs)
Also consider: