Closed allenbaron closed 6 months ago
All tests passed but it doesn't look like there's one specifically for the output of the duplicate_exact_synonym
query.
After checking that ROBOT built correctly with this PR, I ran some ROBOT report tests (default profile) comparing the current release (1.9.5) and the build based on this PR on both the DO's edit file (with and without acronym annotations) and the uberon.owl file.
All the results came out as expected: some new case-insensitive matches were identified in the DO (oops, 😅) and the acronym exclusion worked as expected. The execution time for 5 runs each can be seen in the plot and summaries below (cpu = user + sys; not showing the run against DO with the acronym annotations added but it was the same as shown below for the doid-edit.owl file before I added them). I think the increased time is reasonable and expect most people won't really notice.
There was one problem I can't explain or fix: the values column of the report output was empty for the robot.jar with the new duplicate_exact_synonym
query robot.jar built on this PR (executed from upper dir of this repo as java -jar bin/robot.jar report -i <path_to_owl_file> -o DEL.tsv
). It's not a problem with the query; using the query command with the same jar file produced the expected output.
I agree that it was a pretty smart way to solve the problem but it wasn't mine. The query is essentially the one @anitacaron shared from Uberon with only slight modification to improve performance and additionally exclude acronym. Having her review it is a good idea.
In this query I decided to pass the property up from the subquery to improve performance slightly. In thinking about this just now while working on another query, it occurred to me that this might not work as desired in a specific scenario.
Here's the scenario:
An ontology uses both IAO:0000118 and oio:hasExactSynonym
and has the same synonym once with each of these properties on different terms. These synonyms would not be identified by this query as errors since they use different properties (false negative). Essentially, passing the property up prevents cross-property identification of duplicate synonyms.
Does this matter?
I've actually been wondering about the inclusion of IAO:0000118
. I left it in because it was in the original duplicate_exact_synonym
query but I've wondered if it should be removed, since it appears that IAO:0000118
is the annotation property grouping (i.e. parent) for all of the oboInOwl synonym terms, which include broad, narrow and related types. Based on comments I've heard about annotation properties, my assumption is that annotation property hierarchy doesn't mean anything, so maybe it's more about how IAO:0000118
is used in practice?
DO & Uberon don't use IAO:0000118
but RO, BFO, SO, GENO, and a number of other ontologies do appear to use it.
@allenbaron very good thinking - I totally missed this.
Here is my position:
The case is not very frequent in the OBO World:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
SELECT DISTINCT ?cls ?x ?y WHERE {
?cls <http://purl.obolibrary.org/obo/IAO_0000118> ?x .
?cls <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> ?y .
} LIMIT 1000
Yields 25 results on Ubergraph and ca. 5128 cases on ontobee, the vast majority of which are some stray EFO classes, and some CHEBI and NCBITaxon classes.
My position is that this "scenario" you identified actually makes the situation better, not worse.
Resolves #1175
docs/
have been added/updatedmvn verify
says all tests passmvn site
says all JavaDocs correctCHANGELOG.md
has been updatedAs discussed in issue #1175, this PR excludes exact synonyms annotated with the abbreviation (OMO:0003000) and acronym (OMO:0003012) synonym type from the
duplicate_exact_synonym
test of ROBOT report. It also now ignores case when making comparisons.Exclusion of abbreviations and acronyms should be backward compatible and only incur a time hit for those not using them, while ignoring case will bring up new errors.
I'm happy to change this PR as needed and will attempt to update the documentation and tests if I can (I have no experience with Java).