Open joeflack4 opened 6 months ago
Could this be due to self-subsumptions your removed elsewhere?
Hmm, I think probably not.
I created a feature branch from develop
to work on DO mappings: unable to import (OMIM(PS) --> MIM and then created a build branch from that to create a test build run (sh run.sh make build-mondo-ingest -B
) and received the error:
ValueError: FATAL BUILD ERROR: Ancestors discrepancy
Detected error in consistency of sets of terms gathered from Mondo.
1. Mondo SCR ancestors: 256763
2. Mondo direct SCR relationships: 33797
3. Mondo indirect SCR relationships: 223284
Intersection (Top 5): [('MONDO:0800478', 'rdfs:subClassOf', 'MONDO:0002254'), ('MONDO:0800310', 'rdfs:subClassOf', 'MONDO:0005047'), ('MONDO:0024888', 'rdfs:subClassOf', 'MONDO:0005626'), ('MONDO:0800367', 'rdfs:subClassOf', 'MONDO:0016333'), ('MONDO:0017557', 'rdfs:subClassOf', 'MONDO:0018154')]
"1" should be same as "2" + "3", but instead it has n less rels: 318
See also: https://github.com/monarch-initiative/mondo-ingest/issues/525
Exiting.
make[1]: *** [mondo-ingest.Makefile:543: reports/doid.subclass.confirmed.robot.tsv] Error 1
rm imports/ro_terms_combined.txt
make[1]: Leaving directory '/work/src/ontology'
make: *** [mondo-ingest.Makefile:332: build-mondo-ingest] Error 2
@joeflack4 @matentzn how have you run the mondo-ingest pipeline recently without getting this error?
I ran a data build yesterday from the main
branch:
https://github.com/monarch-initiative/mondo-ingest/pull/566
And all worked fine.
@matentzn Curious, how did you run this? (a) on a fresh clone, or (b) on an existing repo, using -B
?
I only ask because I always do a, never encounter this problem, and Trish always does b, and often/always encounters this problem.
Especially if Nico's answer to the above is a, I suspect the issue might have to do with some caching I set up in one of my Python scripts that was erroneously left on. So far I've checked sync_subclassof.py
, but that file seems to be fine. I may look further into this.
Always (b). Very strange!
To update here, oddly when I ran the build a second time I did not encounter this error. However, this issue came up again for Nico (#582) and is something that definitely needs to be investigated further!
Also could set up an environmental variable STRICT
to where if we set to false
, it'll skip this error and print a warning instead.
I also think now the error in the OP of this PR comes from disconnect between mondo.owl generated from the tmp/mondo.owl pipeline through the mondo repo and the state of mondo-ingest.owl (different sets of subclass axioms?). When I run the full build including imports with -B, the error does not seem to happen.
Overview
I'm surprised that this occurred only when running the debugger in my normal development environment, and I have not seen it happen during builds. That makes me a lot less worried about this. Still, this issue should never happen.
Basically, in
sync_subclassof.py
, Nico and I set up an error to be thrown in cases where OAK's.ancestors()
is returning a different set of relationships than its "direct" + "indirect" parents. That is unexpected, because ancestors are supposed to be nothing more than direct + indirect parents. In the code:Related to this Slack thread in mondo-ingest