ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
259 stars 73 forks source link

Better import configuration #212

Open jamesaoverton opened 6 years ago

jamesaoverton commented 6 years ago

ROBOT has a range of extract features that take lists of term IRIs. In practise, we often have mixed import strategies. Currently that requires multiple extract calls in a Makefile, or something. It would be nice to have more control in a friendlier interface.

  1. OntoFox has configuration files: http://ontofox.hegroup.org/tutorial/index.php#input_format
  2. The original MIREOT implementation would accept a very basic OWL file and fill it out: https://github.com/ontodev/robot/issues/10#issuecomment-112109599
  3. OntoPilot has very nice spreadsheets with terms and import strategies: https://github.com/stuckyb/ontopilot/wiki/Managing-imports

It would be nice to have at least one of these in ROBOT, and having all of them would be even better.

Whatever we do should be integrated with the ontology-starter-kit: https://github.com/INCATools/ontology-starter-kit/blob/master/template/src/ontology/imports/MY-IMPORTED_terms.txt

beckyjackson commented 6 years ago

The OntoPilot CSVs for the terms include Related Entities and Exclude fields. Do we want to have these in an implementation for ROBOT?

Related entities

Whether to retrieve entities that are related to the target entity for inclusion in the import module. This can either be empty, in which case no related entities are explicitly retrieved, or it can be a comma-separated list of one or more of the following values: ancestors, descendants, equivalents, disjoints, domains, ranges, inverses, types, or property assertions.

Exclude

If the value of this field is “true”, “yes”, “t”, or “y” (not case sensitive), then this entity (and related entities, if specified in the Related entities column), will be explicitly excluded from the final import module.

I'm trying to think of a way to make it most user-friendly. I was thinking we could have a master source file like OntoPilot, where you specify the IRIs and entity files for each import... plus maybe a method (MIREOT, STAR, etc.). Then, for each entity file, just a simple list of CURIEs?

jamesaoverton commented 6 years ago

I think we want these things, but the first implementation can focus on operations that we already support.

beckyjackson commented 6 years ago

I got a basic implementation of this working. It definitely needs some refinement, but it gets the job done.

The user passes in a csv to --source-file, and does not need to add an --input ontology. Right now, I thought it would be best to keep it separate from merge. If we wanted to do a full extract & merge into an input ontology, that might make more sense as a new command? Just thinking out loud.

The source file looks something like this:

ID,IRI,method,term file,output
CHMO,http://purl.obolibrary.org/obo/chmo.owl,STAR,imports/source/chmo.txt,imports/chmo_imports.owl
OBI,http://purl.obolibrary.org/obo/obi.owl,MIREOT,imports/source/obi.txt,imports/obi_imports.owl

The term file specifies a txt file with a list of terms for STAR, TOP, and BOT. The MIREOT method also accepts a list of terms, but upper terms are split from lower terms with - (I'm open to better ways of doing this, I just didn't want to have to specify two separate files)

BFO:0000002
-
BFO:0000002

Another option would be to make the term file a csv as well e.g.:

CURIE,level
BFO:0000002,upper
BFO:0000002,lower

And then we can later include the fields that OntoPilot uses. The annoying part is that the term file for the MIREOT method would be slightly different than all the other term files... but it might still be the better way to go.

Each entry in the source file is retrieved by IRI and extract is run based on the terms in the term file. The extracted terms are saved in the file specified by output.

Alternatively, another implementation would be to merge them all into one master file, then you'd have to pass in --output <arg>. Not sure which one would be more beneficial... open to discussion.

jamesaoverton commented 6 years ago

Great.

I don't like either of those upper/lower options. I just took another look at the OntoPilot docs, and the options for Related Entities look very nice. Actually, they're close to #183, and a remove command is close to excluding stuff from an import. On the other hand, OntoPilot's locality option doesn't seem quite right -- you only provide one term and can't specify STAR/TOP/BOT.

We should think harder about this.

When I wrote the MIREOT upper/lower stuff for ROBOT I was thinking about Ontofox, but we never implemented Ontofox's 'computed' option.

What about a RelatedEntities helper class with a method(s) that takes an ontology, maybe a reasoner, a set of entities, and a list of relation option strings (starting with the OntoPilot Related Entity ones), processes the relation options in order, then returns a set of entities? We could use RelatedEntities for this operation, plus remove and maybe extract.

beckyjackson commented 6 years ago

I like the RelatedEntities helper class idea. I'll start playing around with that, then we can move on to how we actually want to implement it with the CLI & user input...

Do we want to specify the relation options for each entity? Or have one list of relation options for the whole set?

jamesaoverton commented 6 years ago

Let's try to make it work like how (I think) OntoPilot works, row-by-row. But generalize the input terms to a set, as I sketched in my last paragraph. I want to discuss all of this with Brian, but I think the discussion will be more productive once we have a very basic Java implementation that we can all refer to.

beckyjackson commented 6 years ago

Should the ancestors & descendants include all parents/children recursively? Or just the direct parents/children?

jamesaoverton commented 6 years ago

Recursively, please.

beckyjackson commented 6 years ago

I created a RelatedEntitiesHelper class that is instantiated with an IRI and an OWLOntology (I'm going to test it today, just wrote out the code yesterday). It gets the EntityType of the IRI from the ontology, and has getter methods for each relation option.

Any time a relation option is requested that doesn't work for an EntityType (e.g. requesting types of a class), it spits out a warning and returns null. I have a couple of questions for the return types, included in the outline below. \ should this be printed to console, or logged?

cmungall commented 6 years ago

Does ancestors just traverse over SubClassOf between named classes? Many bio-ontologies need to traverse over existential restrictions to get the full partonomy/developmental lineage/regulation graph etc.

beckyjackson commented 6 years ago

@cmungall I was avoiding using a reasoner to keep it optional, but wouldn't that be required to traverse over restrictions? Otherwise it's only getting any explicitly stated restrictions as anonymous classes. I could change it to require a reasoner and fall back on a default one (ELK or HermiT?) if none is specified.

Then, I guess we'd want to return axioms instead of entities. Maybe it makes the most sense to just return sets of axioms for all relations...

jamesaoverton commented 6 years ago

I want to be able to get sets of entities for SubClassOf ancestors and descendants. That's fundamental.

Handling other restrictions and returning axioms would be a great addition.

jamesaoverton commented 6 years ago

I would prefer RelatedEntitiesHelper to provide a static method that takes the arguments I laid out: an ontology (or a reasoner attached to an ontology, if that's what we need), a set of entities (or IRIs if that's better), and a list of relations. I thought that it could always return a set of entities, so that we can keep expanding the set.

beckyjackson commented 6 years ago

It can be either entities or IRIs, I've been doing it as IRIs. I guess I was thinking about it as using non-static methods and instantiating an object, but I can change it to be static.

What about when an entity can't be returned, as in the case of restrictions in equivalents or disjoints, data property ranges, or property assertions?

jamesaoverton commented 6 years ago

My current thinking is: either return an empty set, or interpret null input as an empty set.

beckyjackson commented 6 years ago

Do you see this as being part of the extract command, or a new command (like import)?

jamesaoverton commented 6 years ago

I really don't know. Fewer commands is better, but this is probably different enough from extract.

jamesaoverton commented 6 years ago

@stuckyb: Hey Brian! We're discussing better ways to handle imports in ROBOT. OntoPilot has a very nice design, and we're considering implementing it.

It's your design and we want you to get full credit. If you're not happy with ROBOT implementing your design (or a close variation of it), just let us know and we'll stop.

If you're OK with a ROBOT implementation, then let's discuss how it could work. @rctauber has written a first draft in PR #224 so that we have something concrete to talk about.

beckyjackson commented 4 years ago

I have implemented a reworked version of import configuration in #605 - details are in the PR.