Open jamesaoverton opened 6 years ago
The OntoPilot CSVs for the terms include Related Entities and Exclude fields. Do we want to have these in an implementation for ROBOT?
Related entities
Whether to retrieve entities that are related to the target entity for inclusion in the import module. This can either be empty, in which case no related entities are explicitly retrieved, or it can be a comma-separated list of one or more of the following values: ancestors, descendants, equivalents, disjoints, domains, ranges, inverses, types, or property assertions.
Exclude
If the value of this field is “true”, “yes”, “t”, or “y” (not case sensitive), then this entity (and related entities, if specified in the Related entities column), will be explicitly excluded from the final import module.
I'm trying to think of a way to make it most user-friendly. I was thinking we could have a master source file like OntoPilot, where you specify the IRIs and entity files for each import... plus maybe a method (MIREOT, STAR, etc.). Then, for each entity file, just a simple list of CURIEs?
I think we want these things, but the first implementation can focus on operations that we already support.
I got a basic implementation of this working. It definitely needs some refinement, but it gets the job done.
The user passes in a csv to --source-file
, and does not need to add an --input
ontology. Right now, I thought it would be best to keep it separate from merge. If we wanted to do a full extract & merge into an input ontology, that might make more sense as a new command? Just thinking out loud.
The source file looks something like this:
ID,IRI,method,term file,output
CHMO,http://purl.obolibrary.org/obo/chmo.owl,STAR,imports/source/chmo.txt,imports/chmo_imports.owl
OBI,http://purl.obolibrary.org/obo/obi.owl,MIREOT,imports/source/obi.txt,imports/obi_imports.owl
The term file specifies a txt file with a list of terms for STAR, TOP, and BOT. The MIREOT method also accepts a list of terms, but upper terms are split from lower terms with -
(I'm open to better ways of doing this, I just didn't want to have to specify two separate files)
BFO:0000002
-
BFO:0000002
Another option would be to make the term file a csv as well e.g.:
CURIE,level
BFO:0000002,upper
BFO:0000002,lower
And then we can later include the fields that OntoPilot uses. The annoying part is that the term file for the MIREOT method would be slightly different than all the other term files... but it might still be the better way to go.
Each entry in the source file is retrieved by IRI and extract is run based on the terms in the term file. The extracted terms are saved in the file specified by output.
Alternatively, another implementation would be to merge them all into one master file, then you'd have to pass in --output <arg>
. Not sure which one would be more beneficial... open to discussion.
Great.
I don't like either of those upper/lower options. I just took another look at the OntoPilot docs, and the options for Related Entities look very nice. Actually, they're close to #183, and a remove
command is close to excluding stuff from an import. On the other hand, OntoPilot's locality
option doesn't seem quite right -- you only provide one term and can't specify STAR/TOP/BOT.
We should think harder about this.
When I wrote the MIREOT upper/lower stuff for ROBOT I was thinking about Ontofox, but we never implemented Ontofox's 'computed' option.
What about a RelatedEntities helper class with a method(s) that takes an ontology, maybe a reasoner, a set of entities, and a list of relation option strings (starting with the OntoPilot Related Entity ones), processes the relation options in order, then returns a set of entities? We could use RelatedEntities for this operation, plus remove
and maybe extract
.
I like the RelatedEntities helper class idea. I'll start playing around with that, then we can move on to how we actually want to implement it with the CLI & user input...
Do we want to specify the relation options for each entity? Or have one list of relation options for the whole set?
Let's try to make it work like how (I think) OntoPilot works, row-by-row. But generalize the input terms to a set, as I sketched in my last paragraph. I want to discuss all of this with Brian, but I think the discussion will be more productive once we have a very basic Java implementation that we can all refer to.
Should the ancestors & descendants include all parents/children recursively? Or just the direct parents/children?
Recursively, please.
I created a RelatedEntitiesHelper
class that is instantiated with an IRI and an OWLOntology (I'm going to test it today, just wrote out the code yesterday). It gets the EntityType of the IRI from the ontology, and has getter methods for each relation option.
Any time a relation option is requested that doesn't work for an EntityType (e.g. requesting types
of a class), it spits out a warning and returns null
. I have a couple of questions for the return types, included in the outline below.
\ should this be printed to console, or logged?
Set<OWLClass>
, Set<OWLDataProperty>
, or Set<OWLObjectProperty>
[?]
Set<OWLEquivalentClassesAxiom>
, Set<OWLEquivalentDataPropertiesAxiom>
, or Set<OWLEquivalentObjectPropertiesAxiom>
[?]
OWLDisjoint*Axiom
[?]Set<OWLClass>
for an object or datatype propertySet<OWLClass>
for object property, Set<OWLDataPropertyRangeAxiom>
for datatype propertySet<OWLObjectProperty>
for an object property [?]
getInverseObjectPropertyAxioms
?Set<OWLClass>
for an OWLNamedIndividual
Set<OWLPropertyAssertionAxiom<?,?>>
for an OWLNamedIndividual
[?]
Does ancestors just traverse over SubClassOf between named classes? Many bio-ontologies need to traverse over existential restrictions to get the full partonomy/developmental lineage/regulation graph etc.
@cmungall I was avoiding using a reasoner to keep it optional, but wouldn't that be required to traverse over restrictions? Otherwise it's only getting any explicitly stated restrictions as anonymous classes. I could change it to require a reasoner and fall back on a default one (ELK or HermiT?) if none is specified.
Then, I guess we'd want to return axioms instead of entities. Maybe it makes the most sense to just return sets of axioms for all relations...
I want to be able to get sets of entities for SubClassOf ancestors and descendants. That's fundamental.
Handling other restrictions and returning axioms would be a great addition.
I would prefer RelatedEntitiesHelper
to provide a static method that takes the arguments I laid out: an ontology (or a reasoner attached to an ontology, if that's what we need), a set of entities (or IRIs if that's better), and a list of relations. I thought that it could always return a set of entities, so that we can keep expanding the set.
It can be either entities or IRIs, I've been doing it as IRIs. I guess I was thinking about it as using non-static methods and instantiating an object, but I can change it to be static.
What about when an entity can't be returned, as in the case of restrictions in equivalents or disjoints, data property ranges, or property assertions?
My current thinking is: either return an empty set, or interpret null input as an empty set.
Do you see this as being part of the extract command, or a new command (like import
)?
I really don't know. Fewer commands is better, but this is probably different enough from extract
.
@stuckyb: Hey Brian! We're discussing better ways to handle imports in ROBOT. OntoPilot has a very nice design, and we're considering implementing it.
It's your design and we want you to get full credit. If you're not happy with ROBOT implementing your design (or a close variation of it), just let us know and we'll stop.
If you're OK with a ROBOT implementation, then let's discuss how it could work. @rctauber has written a first draft in PR #224 so that we have something concrete to talk about.
I have implemented a reworked version of import configuration in #605 - details are in the PR.
ROBOT has a range of
extract
features that take lists of term IRIs. In practise, we often have mixed import strategies. Currently that requires multipleextract
calls in aMakefile
, or something. It would be nice to have more control in a friendlier interface.It would be nice to have at least one of these in ROBOT, and having all of them would be even better.
Whatever we do should be integrated with the ontology-starter-kit: https://github.com/INCATools/ontology-starter-kit/blob/master/template/src/ontology/imports/MY-IMPORTED_terms.txt