monarch-initiative / phenol

phenol: Phenotype ontology library
https://phenol.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Shortest path algorithm #28

Closed pnrobinson closed 5 years ago

pnrobinson commented 6 years ago

For a new application code algorithm it would be good to have a version of the BFS that would find the set of all nodes that have a shortest path to a single query node. There is a visitor pattern that could be extended to do this, but I think it will require refactoring. @cmungall @yy20716 @vidaravanmehr should we make a decision about using a graph library?

yy20716 commented 6 years ago

It's just a preliminary opinion, but if we could employ TinkerPop with its query language Gremlin, we may be able to leverage some path algorithms / traversal recipes provided by Gremlin here.

pnrobinson commented 6 years ago

TinkerPop seems to have the largest user base and thus might be the most sustainable graph algorithm. Would this interact well with an OWLAPI-based parser? Would we essentially build the graph with an iteration of edges and nodes detected during OWLAPI parsing, posthoc? Then if we had a slim wrapper for graph algorithms, we would be insulated and could replace the graph library later on if we felt we needed to.

yy20716 commented 6 years ago
pnrobinson commented 6 years ago

Yes this is really nice. I would suggest we basically create a set of static algorithms that use the JGrapht library under the hood. I seem some you already implemented in the class OntologyAlgorithm. Should we continue to add methods to this class? Or should we break it down into a set of classes for various types of algorithms?

yy20716 commented 6 years ago

It's just a preliminary opinion, but it would be nice if we can consider the second option, i.e. break down the current OntologyAlgorithm class per algorithm, similar to the organization used in JGraphT. This would allow us to avoid building a giant file that contains all algorithms in a single class. Any other suggestions would be appreciated.

julesjacobsen commented 6 years ago

Absolutely break it down. The smaller classes can be more easily tested and you'll not end up with a hairball.

To make things easier on the client, you could of course use a public OntologyAlgorithms utility class with a lot of static methods on it which call through to package private implementation classes. This would be optimal as the algorithm implementation could change and the client would never be exposed to this.

e.g. Client sees:

    OntologyTerm a = ...
    OntologyTerm b = ...
    List<OntologyTerm> shortestPath = OntologyAlgorithms.shortestPath(a, b);
    OntologyTerm lcs = OntologyAlgorithms.lowestCommonSubsumer(a, b);

This would be stable as internally phenol would look like this:

public class OntologyAlgorithms {
    private OntologyAlgorithms () {
        //not instantiable
    }
    public static List<OntologyTerm>shortestPath(OntologyTerm a, OntologyTerm b) {
        // JgraphtAlgorithmWrapper implementation of shortest path, or if you no longer like it, use another one. 
        return jgraphtAlgorithmWrapper.shortestPath(a, b); 

    public static OntologyTerm lowestCommonSubsumer(OntologyTerm a, OntologyTerm b) {
        //use another algorithm implementation from the package
    }
}

//If you intend to have multiple implementations, define an interface from the start, although this isn't critical as none of these are public classes.
interface ShortestPathFinder {
    List<OntologyTerm>shortestPath(OntologyTerm a, OntologyTerm b);
}

//n.b. this is not public
class JgraphtShortestPathFinder implements ShortestPathFinder {
    List<OntologyTerm>shortestPath(OntologyTerm a, OntologyTerm b) {
        //implementation...
    }
}

//or use the tinkerpop one if that suits you
//n.b. this is not public
class TinkerPopShortestPathFinder implements ShortestPathFinder {
    List<OntologyTerm>shortestPath(OntologyTerm a, OntologyTerm b) {
        //implementation...
    }
}

//n.b. this is not public
class JgraphtLcsFinder {
    OntologyTerm lowestCommonSubsumer(OntologyTerm a, OntologyTerm b) {
        //implementation...
    }
}

and you could swap the internal implementation as you needed without forcing any changes on the client.

pnrobinson commented 6 years ago

I think both suggestions sound pretty good.

pnrobinson commented 6 years ago

There are also algorithms in the graph package (right next door). These include things like breadth first search. Probably we do not want to expose these algorithms directly. I am also not sure whether it makes much sense to have separate packages for ontology and graph -- since we are concentrating on OBO-like operations in phenol, the ontology algorithms are all graph algorithms and are not, say, description logic algorithms or what not. So I wonder if we should merge these two packages, and make everything that involves access to the JGraphT objects simply be package private (or not exported whenever we move to Java 9).

julesjacobsen commented 6 years ago

I agree, it makes sense to move the graph.algo into ontology.algo and keep any graph algorithm implementations as package private. Phenol isn't trying to be a generic graph algorithm library, it's an ontology library.

pnrobinson commented 5 years ago

This issue has become stale and we have gone with the JGraphT option. Closing!