Fixing import module woes

cmungall commented 9 years ago

Note: this ticket is not yet well-specified, intended to serve as an area to get the ball rolling and collect requirements. Some familiarity with existing makefile-based owltools pipelines would help.

Minimal annotations

When creating import modules, I generally make the annotations minimal. Labels (plus the necessary logical axioms for reasoning of course). Sometimes this is not ideal; editors of the main ontology like to be able to search by syns, see definitions at least. E.g. https://code.google.com/p/envo/issues/detail?id=131

Currently the strategy is to tweak the makefile according to requests from the main ontology editors. E.g. sometimes we change the command to include syns/defs. The main reason not to include these is simply to avoid VCS churn (e.g. for GO the import modules are regenerated daily, people frequently need the set of external terms referenced handy)

Extending the import module

If editors need an external term not currently in an import module, we often put them through a hellish procedure. Some README-editors files have a baroque procedure involving switching to a URI view, pasting in the desire URI, then regenerating the imports. With GO editors add to an imports-requests obo file and Jenkins will include the term in the import module when they wake up next morning.

Short of having this fully integrated into Protege it would be nice if there was an easy to run command line way of extending the import module.

Another thing that would be useful would be a way of seamlessly swapping out the import module for the full ontology (with some catalog trickery plus handling of syncing a local copy of the full ontology). This would allow the editors unhindered search of the full external ontology. This can sometimes overwhelm Protege (e.g. CHEBI or Uberon as externals) so an intermediate strategy may be useful (e.g. a module that includes all classes and logical axioms, but excludes axiom annotations, for example)

cmungall commented 9 years ago

OntoFox should not be out of the question. I generally find I need the full version of the external ontology (or some version of it) pre-processed by custom commands in place first, and I like using code that is documented parts of the OWLAPI and I feel that this is the best target for longer term development (ideally the whole module extraction business should be much more dynamic and driven by different requirements - e.g. a reasoner would extract a module for one purpose, a Protege user another, a downsream user another).

But other opinions are available

jamesaoverton commented 9 years ago

OntoFox started as a MIREOT tool, then added some tricks that make sense in SPARQL. The OntoFox import format covers a range of options, including remapping superclasses and annotation properties, and extracting partial hierarchies. We should support these features but pull them apart into separate, composable commands. We should be able to specify all the commands needed to build an ontology (and its subsets) in a build file.

OntoFox has limitations that we need to overcome:

use local copies of ontologies instead of a central triplestore
use specific versions of ontologies, not just whatever is in the store

I think we should fetch versioned ontologies to a local cache, then build a OWLAPI IRIMapper for the cache. The cache could be project-specific (like Ivy) or user-specific (like Maven) -- I think I prefer the former.

Specifying versions is really important, but people are lazy. We should include a command to update all dependencies to the latest version (and make it easy to roll back). Then we should enforce the use of versioned ontologies.

Once the source ontologies are handled, we can start by supporting two main import methods: MIREOT (isolated terms) and modules (using the Syntactic Locality Module Extractor). We can get more control over modules by stripping axioms first.

cmungall commented 9 years ago

Let's discuss the ivy vs maven thing next call

ShahimEssaid commented 9 years ago

I’ve developed an approach that I use for one of our projects for dealing with similar needs. I also built some tooling for it but I’m redeveloping it in a different way so the existing tooling will not evolve beyond bug fixes for now. My approach was the following:

We have one or more of input axiom sets (source ontologies)
We have an OWL configuration file that describes what axioms should be copied from input sets, (i.e. the ontologies). I use OWL entity annotations to mark entities as “include”, “include-with-subs”, “exclude”, “exclude-with-subs”, etc. These are OWL annotations on the corresponding entities; they can be added like any other annotations in protégé. They are like tags.
The tool looks at the input sets, the configuration file, and then acts accordingly to generate the output axioms, a single owl file.
The configuration file also lists “builders” that should be applied by the tool against the input axioms. A builder is a named algorithm that extracts some axioms based on the configuration. You can think of them as commands but they are represented in an ontology annotation in the configuration file. Several builders can be specified to apply a chain of actions by the tool but. The builders are contributed by Jar files that provide a named factory for instances of the builders. See this file for example: https://open.med.harvard.edu/svn/eagle-i-dev/datamodel/trunk/src/isf/module/eaglei/eaglei-module-configuration.owl It only lists the “simple-inferred” builder in one of the ontology annotations. When the tool is ran, it first runs the “simple-inferred” builder against the source closure that is specified with the eaglei-module-source.owl import. Few other ontology annotations and configuration files add the following features:
While the builders run, if any of the extracted axioms are in the “exclude” set, they are not added to the output. The “exclude” set is represented in another OWL file. See the SVN directory for the above file.
After the builders run, there is a “include” set to add additional axioms that can’t be extracted by the builders. The tooling I wrote is called OwlCL, which stands for “owl command line”. It is based on Guice, and it is extensible by simply adding jars in the extension directory. You can see the project here: https://github.com/ShahimEssaid/OwlCL I was planning on continuing with this tool but I’m now favoring a different approach so I’m not evolving it any more beyond bug fixes for my own use. Few of the existing commands are also broken but I will fix them as needed. As I said above, the tool is being replaced with a more general framework based on the Java CDI framework. The above approach, or some better version of it, will be re-implemented in the new framework. I am working on this whenever I have time but I don’t want to promise anything until it is ready for a show and tell. The new tool is basically a generic “CDI command execution framework and runtime container” that enables all the Java EE features of CDI in Java SE. This framework can then be embedded in Protégé to provide may commands (invoked from menus), or in Gradle and have the commands wrapped with Gradle tasks to be driven by a Gradle build, or wrapped in a command line interface such as JCommander to use it from the command line. The framework doesn’t have a specific interface; it just has a client API that can be embedded in different interfaces. I’ll hopefully have something soon that I can document and share to get some initial feedback.

ShahimEssaid commented 9 years ago

For the Ivy vs Maven issue, I think we need to pick a repository structure (not the specific deployment tools) and then we can reuse existing tools to work with it. For example, Ivy can work with Maven repositories but Maven doesn't understand the Ivy repository structure unless there is a plugin that can help with that.

My approach would be to define how we will use the Maven repository structure and versioning to share ontology artifacts, define what is an "ontology artifact", how such an artifact be used in Java SE, OSGi, OWL-API, direct URLs import, etc. This will require that the artifact contain multiple representations of the ontology (as a jar file, as a compressed/uncompressed rdf files, other syntaxes, etc.) to accommodate these different use cases for the same ontology. I have some ideas for this but I first need to test them in a prototype.

I would then pick an API to work with the above Maven repository. My recommendation is to use Ivy. It is more sophisticated, better documented, and is a pure dependency resolution framework as opposed to the Maven APIs that also include a build API. Currently, Maven is using Eclipse's Aether framework under the hood to do the dependency resolution but the last time I looked at the API there was not enough documentation, and I don't think there is a lot of use of this API beyond the company that develops Maven related tooling (Sonatype). Gradle uses Ivy for dependency resolution so the Ivy API is part of the Gradle core.

ontodev / robot

Fixing import module woes #3

Minimal annotations

Extending the import module