Open cmungall opened 9 years ago
OntoFox should not be out of the question. I generally find I need the full version of the external ontology (or some version of it) pre-processed by custom commands in place first, and I like using code that is documented parts of the OWLAPI and I feel that this is the best target for longer term development (ideally the whole module extraction business should be much more dynamic and driven by different requirements - e.g. a reasoner would extract a module for one purpose, a Protege user another, a downsream user another).
But other opinions are available
OntoFox started as a MIREOT tool, then added some tricks that make sense in SPARQL. The OntoFox import format covers a range of options, including remapping superclasses and annotation properties, and extracting partial hierarchies. We should support these features but pull them apart into separate, composable commands. We should be able to specify all the commands needed to build an ontology (and its subsets) in a build file.
OntoFox has limitations that we need to overcome:
I think we should fetch versioned ontologies to a local cache, then build a OWLAPI IRIMapper for the cache. The cache could be project-specific (like Ivy) or user-specific (like Maven) -- I think I prefer the former.
Specifying versions is really important, but people are lazy. We should include a command to update all dependencies to the latest version (and make it easy to roll back). Then we should enforce the use of versioned ontologies.
Once the source ontologies are handled, we can start by supporting two main import methods: MIREOT (isolated terms) and modules (using the Syntactic Locality Module Extractor). We can get more control over modules by stripping axioms first.
Let's discuss the ivy vs maven thing next call
I’ve developed an approach that I use for one of our projects for dealing with similar needs. I also built some tooling for it but I’m redeveloping it in a different way so the existing tooling will not evolve beyond bug fixes for now. My approach was the following:
For the Ivy vs Maven issue, I think we need to pick a repository structure (not the specific deployment tools) and then we can reuse existing tools to work with it. For example, Ivy can work with Maven repositories but Maven doesn't understand the Ivy repository structure unless there is a plugin that can help with that.
My approach would be to define how we will use the Maven repository structure and versioning to share ontology artifacts, define what is an "ontology artifact", how such an artifact be used in Java SE, OSGi, OWL-API, direct URLs import, etc. This will require that the artifact contain multiple representations of the ontology (as a jar file, as a compressed/uncompressed rdf files, other syntaxes, etc.) to accommodate these different use cases for the same ontology. I have some ideas for this but I first need to test them in a prototype.
I would then pick an API to work with the above Maven repository. My recommendation is to use Ivy. It is more sophisticated, better documented, and is a pure dependency resolution framework as opposed to the Maven APIs that also include a build API. Currently, Maven is using Eclipse's Aether framework under the hood to do the dependency resolution but the last time I looked at the API there was not enough documentation, and I don't think there is a lot of use of this API beyond the company that develops Maven related tooling (Sonatype). Gradle uses Ivy for dependency resolution so the Ivy API is part of the Gradle core.
Note: this ticket is not yet well-specified, intended to serve as an area to get the ball rolling and collect requirements. Some familiarity with existing makefile-based owltools pipelines would help.
Minimal annotations
When creating import modules, I generally make the annotations minimal. Labels (plus the necessary logical axioms for reasoning of course). Sometimes this is not ideal; editors of the main ontology like to be able to search by syns, see definitions at least. E.g. https://code.google.com/p/envo/issues/detail?id=131
Currently the strategy is to tweak the makefile according to requests from the main ontology editors. E.g. sometimes we change the command to include syns/defs. The main reason not to include these is simply to avoid VCS churn (e.g. for GO the import modules are regenerated daily, people frequently need the set of external terms referenced handy)
Extending the import module
If editors need an external term not currently in an import module, we often put them through a hellish procedure. Some README-editors files have a baroque procedure involving switching to a URI view, pasting in the desire URI, then regenerating the imports. With GO editors add to an imports-requests obo file and Jenkins will include the term in the import module when they wake up next morning.
Short of having this fully integrated into Protege it would be nice if there was an easy to run command line way of extending the import module.
Another thing that would be useful would be a way of seamlessly swapping out the import module for the full ontology (with some catalog trickery plus handling of syncing a local copy of the full ontology). This would allow the editors unhindered search of the full external ontology. This can sometimes overwhelm Protege (e.g. CHEBI or Uberon as externals) so an intermediate strategy may be useful (e.g. a module that includes all classes and logical axioms, but excludes axiom annotations, for example)