monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Bug: `sync_subclass`: ancestors discrepancy #525

Open joeflack4 opened 6 months ago

joeflack4 commented 6 months ago

Overview

I'm surprised that this occurred only when running the debugger in my normal development environment, and I have not seen it happen during builds. That makes me a lot less worried about this. Still, this issue should never happen.

Basically, in sync_subclassof.py, Nico and I set up an error to be thrown in cases where OAK's .ancestors() is returning a different set of relationships than its "direct" + "indirect" parents. That is unexpected, because ancestors are supposed to be nothing more than direct + indirect parents. In the code:

    rels_indirect_mondo_mondo = ancestors_mondo_mondo.difference(rels_direct_mondo_mondo)
    missing_ancestor_rels = rels_indirect_mondo_mondo.union(rels_direct_mondo_mondo).difference(ancestors_mondo_mondo)

Related to this Slack thread in mondo-ingest

matentzn commented 6 months ago

Could this be due to self-subsumptions your removed elsewhere?

joeflack4 commented 6 months ago

Hmm, I think probably not.

twhetzel commented 5 months ago

I created a feature branch from develop to work on DO mappings: unable to import (OMIM(PS) --> MIM and then created a build branch from that to create a test build run (sh run.sh make build-mondo-ingest -B) and received the error:

ValueError: FATAL BUILD ERROR: Ancestors discrepancy
Detected error in consistency of sets of terms gathered from Mondo.

 1. Mondo SCR ancestors: 256763
 2. Mondo direct SCR relationships: 33797
 3. Mondo indirect SCR relationships: 223284
 Intersection (Top 5): [('MONDO:0800478', 'rdfs:subClassOf', 'MONDO:0002254'), ('MONDO:0800310', 'rdfs:subClassOf', 'MONDO:0005047'), ('MONDO:0024888', 'rdfs:subClassOf', 'MONDO:0005626'), ('MONDO:0800367', 'rdfs:subClassOf', 'MONDO:0016333'), ('MONDO:0017557', 'rdfs:subClassOf', 'MONDO:0018154')]
 "1" should be same as "2" + "3", but instead it has n less rels: 318
See also: https://github.com/monarch-initiative/mondo-ingest/issues/525

Exiting.
make[1]: *** [mondo-ingest.Makefile:543: reports/doid.subclass.confirmed.robot.tsv] Error 1
rm imports/ro_terms_combined.txt
make[1]: Leaving directory '/work/src/ontology'
make: *** [mondo-ingest.Makefile:332: build-mondo-ingest] Error 2

@joeflack4 @matentzn how have you run the mondo-ingest pipeline recently without getting this error?

matentzn commented 5 months ago

I ran a data build yesterday from the main branch:

https://github.com/monarch-initiative/mondo-ingest/pull/566

And all worked fine.

joeflack4 commented 5 months ago

@matentzn Curious, how did you run this? (a) on a fresh clone, or (b) on an existing repo, using -B?

I only ask because I always do a, never encounter this problem, and Trish always does b, and often/always encounters this problem.


Especially if Nico's answer to the above is a, I suspect the issue might have to do with some caching I set up in one of my Python scripts that was erroneously left on. So far I've checked sync_subclassof.py, but that file seems to be fine. I may look further into this.

matentzn commented 5 months ago

Always (b). Very strange!

twhetzel commented 5 months ago

To update here, oddly when I ran the build a second time I did not encounter this error. However, this issue came up again for Nico (#582) and is something that definitely needs to be investigated further!

joeflack4 commented 5 months ago

Also could set up an environmental variable STRICT to where if we set to false, it'll skip this error and print a warning instead.

joeflack4 commented 5 months ago

Nico thinks:

I also think now the error in the OP of this PR comes from disconnect between mondo.owl generated from the tmp/mondo.owl pipeline through the mondo repo and the state of mondo-ingest.owl (different sets of subclass axioms?). When I run the full build including imports with -B, the error does not seem to happen.