ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
261 stars 74 forks source link

new command: subset (filter option with ability to traverse all edges when preserve-structure is set) #497

Closed cmungall closed 1 year ago

cmungall commented 5 years ago

Carried on from #263

Given:

ontology: test
subsetdef: foo "foo"

[Term]
id: X:1
subset: foo

[Term]
id: X:2
relationship: part_of X:1

[Term]
id: X:3
relationship: part_of X:2

[Term]
id: X:4
relationship: part_of X:3

[Term]
id: X:5
is_a: X:4
subset: foo

[Typedef]
id: part_of
is_transitive: true
xref: BFO:0000050

Currently we have:

$ robot filter -i robot-core/src/test/resources/subset_test.obo --select annotations --prefix "subset: http://purl.obolibrary.org/obo/test#" --select "oboInOwl:inSubset=subset:foo" -o z.obo && cat z.obo
format-version: 1.2
ontology: test

[Term]
id: X:1
subset: foo

[Term]
id: X:5
subset: foo

We would like to see a part_of between X:5 and X:1.

This is pretty fundamental for all subset use cases involving GO, any anatomy ontology, CL, ENVO, any ontology that has part-of.

There are two ways this could be implemented. One is an ad-hoc graph walking procedure with rules for edge label composition (this is how the equivalent command in owltools worked).

The other is to use materialize. A special purpose case of this code could be written than will be more efficient for this use case, where for every selected class C we test for all inferred subclasses R some C for all structure preserving relations R.

To preserve backwards compatibility we should have the user specify the list of OPs they want to include. However, this is such a common use case it would be nice to have a command subset as originally specified in #263 that is a shorthand for filter --preserve-structure true --use-all-relations true --select annotations -select "oboInOwl:inSubset=subset:$SUBSET"

jamesaoverton commented 5 years ago

Sounds good. A new subset command is fine by me, and I like the idea in the final paragraph about enhancing filter then making subset a shorthand. (Shorthands could be good solutions in the many cases where we've ended up with suboptimal defaults.)

cmungall commented 2 years ago

Here is an alternative algorithm. This uses relation-graph which is not yet a robot dependency (but it could be). This strategy may still be useful to readers of this ticket as a post-processing step.

  1. run relation-graph on O to make G
  2. use robot filter to make filtered ontology O'
  3. remove all edges from G if either subject or object is not on O', yielding G'
  4. remove redundant edges from G'
  5. add edges from G' back to O' (using standard axiom->edge translation)
matentzn commented 2 years ago

Can we rename this ticket to --preserve-structure true should fill simple existential gaps?

It sounds to me like we are having two issues here in one:

  1. A subset command that aliases a complex other command
  2. A strategy for filling gaps across existential restrictions
dosumis commented 2 years ago

We discussed this on a strategy call with @cmungall & @matentzn. Agreed that this is a high priority and that @cmungall's suggested solution would work well (In fact, we've successfully tested the strategy elsewhere. @hkir-dev from my group can implement.

@jamesaoverton are you happy with the proposal to fold in @balhoff 's relation graph as a dependency?

jamesaoverton commented 2 years ago

I trust @balhoff and his code. Two concerns:

  1. Specific: backwards compatibility of existing ROBOT options and commands. I'm not clear on what is being proposed in this issue anymore.
  2. General/Vague: ROBOT is all about OWL. If the idea is to add a bunch of Knowledge Graph operations to ROBOT, I think it will be a bad fit and unsatisfactory for everybody.
matentzn commented 2 years ago

I think we are in agreement that all changes to ROBOT that are not clearly bug fixes are backwards compatible. As far as I understand this thread, we assume from now on the preserve-structure only preserves subclass structure, and introduce a new parameter --use-all-relations true that will furthermore try to "preserve the structure across existentials" - this is sound, but incomplete, but clearly an important use case for ROBOT users that do not want to lose X5 part of X1 in the example Chris gives.

The OWL solution to this problem is really complex, as we have no way to constraint the traversal space effectively we would have to:

  1. run the reasoner on the original ontology
  2. for each two classes check wether an existential over them is implied

or alternatively

  1. build the existential closure of all A sub R some C
  2. And somehow remove redundant ones.

Which we tried and does not work the way it should (not to speak of the fact that this requires quite a lot of memory and costly reasoning).

I agree that ROBOT is unsuitable for KG operations, but we do need to be able to:

in these two issues, KG and Ontology Use Cases just intersect a bit I think.

dosumis commented 2 years ago

I trust @balhoff and his code. Two concerns:

Specific: backwards compatibility of existing ROBOT options and commands. I'm not clear on what is being proposed in this issue anymore. General/Vague: ROBOT is all about OWL. If the idea is to add a bunch of Knowledge Graph operations to ROBOT, I think it will be a bad fit and unsatisfactory for everybody.

@jamesaoverton - I think these are understandable concerns, but not warranted in this case:

RelationGraph construction uses only OWL inference and gives us a short cut to a functionality that can be defined entirely in terms of OWL semantics, but which is not currently supported by ROBOT:

This will fulfil a pressing need for simple mechanisms to produce simplified but correct views of our increasingly complex ontologies containing only terms that are needed/understood by specific user communities.

I think @matentzn's suggestion for an extension to the filter command options deals with the backwards compatibility issue. As the filter command has become increasingly complicated and cluttered with options, having a subset command that hides this complexity would greatly aid usability.

dosumis commented 2 years ago

Hi @jamesaoverton - did you get a chance to give this more thought? I'm happy to assign @hkir-dev to do this work if you're happy to give it the thumbs up.

jamesaoverton commented 2 years ago

Yes. @matentzn and I had a long conversation about this on Tuesday, and he addressed my concerns. We should move forward with adding this functionality to ROBOT.

I think this would fit best as another method for robot extract. You probably want a lot of the same command-line options that we provide for robot extract --method MIREOT: upper/lower term(s), branch from term(s), intermediates.

If I'm wrong about that, a new command is ok.

matentzn commented 2 years ago

Alright, after another week of thinking about @hkir-dev will try the following:

  1. Add relationgraph dependency and ensure this does not increase Jar file size (should not).
  2. extend robot extract with a new method, --method subset
  3. The method will run the following algorithm (which has been described above, just for completion:
    • RG= Relation Graph(O)
    • ORG=TOOWL(RG)
    • OF = RobotRemove(T, ORG)
    • OR = RobotReduce(OF)
  4. For now, lets make a very simple command that only deals with the -T/--terms-file parameter.

Then we will see whether Reduce() really works the way we hope!

cmungall commented 2 years ago

Alright, after another week of thinking about @hkir-dev will try the following:

  1. Add relationgraph dependency and ensure this does not increase Jar file size (should not).
  2. extend robot extract with a new method, --method subset
  3. The method will run the following algorithm (which has been described above, just for completion:

    • RG= Relation Graph(O)
    • ORG=TOOWL(RG)
    • OF = RobotRemove(T, ORG)
    • OR = RobotReduce(OF)
  4. For now, lets make a very simple command that only deals with the -T/--terms-file parameter.

Is T expected to be both the actual subset plus the set of object properties used?

Then we will see whether Reduce() really works the way we hope!

I think this is equivalent to what I wrote. I think I wrote the reduce operation, and I think it essentially implements an efficient RG in order to calculate redundancy, so we should test it on decent size ontologies.

hkir-dev commented 2 years ago

Is T expected to be both the actual subset plus the set of object properties used?

Yes