Open matentzn opened 1 year ago
I think those other sets of ancestors may have been from a previous version of Phenio (the following is from v2023-07-11).
$ runoak -i sqlite:obo:phenio ancestors -p i,p MP:0000001
id label
BFO:0000001 entity
BFO:0000002 continuant
BFO:0000020 specifically dependent continuant
MP:0000001 mammalian phenotype (MPO)
PATO:0000001 quality
UPHENO:0001001 Phenotype
UPHENO:0001001 phenotype
UPHENO:0001003 phenotype by ontology source
$ runoak -i sqlite:obo:phenio ancestors -p i,p HP:0000118
id label
BFO:0000001 entity
BFO:0000002 continuant
BFO:0000020 specifically dependent continuant
HP:0000001 All
HP:0000001 All (HPO)
HP:0000118 Phenotypic abnormality
HP:0000118 Phenotypic abnormality (HPO)
PATO:0000001 quality
UPHENO:0001001 Phenotype
UPHENO:0001001 phenotype
UPHENO:0001002 Phenotypic abnormality
UPHENO:0001003 phenotype by ontology source
UPHENO:0001005 abnormal phenotype by ontology source
None of those pesky CARO or UBERON terms. Still not completely parallel, since HP:0000118 -> BFO:0000001 has to traverse UPHENO:0001005 and UPHENO:0001002. The tree view:
$ runoak -i sqlite:obo:phenio tree -p i,p MP:0000001
* [] BFO:0000001 ! entity
* [i] BFO:0000002 ! continuant
* [i] BFO:0000020 ! specifically dependent continuant
* [i] PATO:0000001 ! quality
* [i] UPHENO:0001001 ! phenotype
* [i] UPHENO:0001003 ! phenotype by ontology source
* [i] **MP:0000001 ! mammalian phenotype (MPO)**
$ runoak -i sqlite:obo:phenio tree -p i,p HP:0000118
* [] BFO:0000001 ! entity
* [i] BFO:0000002 ! continuant
* [i] BFO:0000020 ! specifically dependent continuant
* [i] PATO:0000001 ! quality
* [i] UPHENO:0001001 ! phenotype
* [i] UPHENO:0001002 ! Phenotypic abnormality
* [i] UPHENO:0001005 ! abnormal phenotype by ontology source
* [i] **HP:0000118 ! Phenotypic abnormality (HPO)**
* [i] UPHENO:0001003 ! phenotype by ontology source
* [i] UPHENO:0001005 ! abnormal phenotype by ontology source
* [i] **HP:0000118 ! Phenotypic abnormality (HPO)**
* [] HP:0000001 ! All (HPO)
* [i] **HP:0000118 ! Phenotypic abnormality (HPO)**
The shortest path between the two is still short: ['MP:0000001', 'UPHENO:0001003', 'UPHENO:0001005', 'HP:0000118'] but this is also part of the path between all HP and MP nodes without other shared UPHENO terms in PHENIO. My point being that there is additional distance to cover in the cross-phenotype comparisons, semsim or otherwise.
Excellent @caufieldjh. Sorry I didnt realise you made that analysis already in this ticket. I guess then CARO/UBERON was not really taken into account, right?
Lets stick with this example here for a minute.
Ok, to restate the problem: When using a lattice, we create a lot of parallel hierarchies, like HPO:All is a distinct parent of HP:PA and not of MP:PA. When using the equivalence model, MP:PA would automatically get HP:PA as a parent.
Now the question is: is that desireable?
I don't know either way right now. If the polyhierarchies are not harmonised (which they are not), the lattice models would result in a large amount of distance between terms which would have been virtually identical after the equivalence model.
cc @souzadevinicius this is going to be the first scientific question we will have to answer!
In https://github.com/monarch-initiative/semsimian/issues/82#issuecomment-1658950359
@caufieldjh showed us that HP:phenotypic abnormality is very different parents than MP:phenotypic abnormality.
Can we determine why? In particular, why does the HP term have Uberon ancestors?
@caufieldjh I will assign you for now, but feel free to talk to Chris and assign someone else - it is easier for me to work if I can assign while creating the ticket so I am sure its not dropping of the radar.