Closed kltm closed 5 years ago
I had run into this earlier. I think this is a dipper / ingest error.
Feel free to move whatever tracker is appropriate.
confirm GO:0022008PHENOTYPE
is via the dipper GO ingest
likely here: https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/GeneOntology.py#L323
Cannot speak to why.
this issue could be moved to dipper
These are upheno grouping classes, but I believe they will be deprecated
I am very interested in this going away as well, and I think we had discussed it in one of the previous data calls.. There should be 'real' phenotype terms for most, if not all of these cases;
Can someone make a list of ingest scripts where these are generated or sources where these come from?
Here is what I find, the blank node are neither here nor there the links to nowhere at OBO would be irritating to me if I was running OBO.
perhaps
##############################
ntriples/flybase.nt:273649
RDF_SUBJECTS
------------
6416 <http://purl.obolibrary.org/obo
5798 FBbt_
616 GO_
2 SO_
5294 <https://monarchinitiative.org/.well-known
RDF_OBJECTS
-----------
248215 <http://purl.obolibrary.org/obo
228550 FBbt
19465 GO
200 SO
13724 <https://monarchinitiative.org/.well-known
########################
ntriples/go.nt:82180
RDF_SUBJECTS
------------
0 (zero)
RDF_OBJECTS
-----------
82180 <http://purl.obolibrary.org/obo
82180 GO_
###########################
ntriples/monarch.nt:76
RDF_SUBJECTS
------------
0 (zero)
RDF_OBJECTS
-----------
76 <http://purl.obolibrary.org/obo
6 CL
22 GO_
2 MPATH_
10 NBO_
36 UBERON_
Thanks Tom, could you tell me what exactly you ran to create this output? Does this mean that there are 5798 FBBT123PHENOTYPE classes? It would make some sense (flybase curates anatomical phenotypes against their anatomy ontology only), but just wanted to be sure! Thanks.
sure shell commands . have some history:
cd Dev/NTriples_201901/
fgrep -c "PHENOTYPE> " ntriples/*.nt
cut -f1 -d' ' ntriples/flybase.nt |fgrep "PHENOTYPE>"| head
cut -f1 -d' ' ntriples/flybase.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f3 -d' ' ntriples/flybase.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f1 -d' ' ntriples/go.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f3 -d' ' ntriples/go.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f3 -d' ' ntriples/monarch.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f1 -d' ' ntriples/monarch.nt |fgrep "PHENOTYPE>"|cut -f1-4 -d '/'|sort|uniq -c|sort -nr
cut -f1 -d' ' ntriples/flybase.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|less
cut -f1 -d' ' ntriples/flybase.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|head
cut -f1 -d' ' ntriples/flybase.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|cut -f1 -d \_|head
cut -f1 -d' ' ntriples/flybase.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|cut -f1 -d \_|sort|uniq -c
cut -f3 -d' ' ntriples/go.nt |grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|cut -f1 -d \_|sort|uniq -c
cut -f3 -d' ' ntriples/monarch.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|cut -f1 -d \_|sort|uniq -c
cut -f3 -d' ' ntriples/flybase.nt|grep "<http://purl.obolibrary.org/obo/.*PHENOTYPE>"|cut -f5 -d \/|cut -f1 -d \_|sort|uniq -c
then paste the less crufty bits into the lovingly crafted ticket
@kshefchek
Do these generated classes like FBBT123PHENOTYPE somehow make it into OWLSIM? Would monarch (phenogrid) recognise FBBT:HEADPHENOTYPE
to be similar to HP:abnormal HEAD
?
They do make it into owlsim, and can be compared, for example:
https://monarchinitiative.org/owlsim/compareAttributeSets?a=FBbt:00000004PHENOTYPE&b=HP:0000234
0 for phenodigm, .28 jaccard sim
https://monarchinitiative.org/owlsim/searchByAttributeSet?a=FBbt:00000004PHENOTYPE
Theres not much in terms of connections outside of fly genes.
Wow, this is surprising. I mean, in order to have a jaccard of 0.28, it must share some superclasses.. I cant see at the moment where owlsim would get them from? Is there any way to see the shared superclasses from the Jaccard result?
The following classes are in common:
p.iri | p.label |
---|---|
"http://purl.obolibrary.org/obo/UBERON_0000033PHENOTYPE" | "head phenotype" |
"http://purl.obolibrary.org/obo/UBERON_0007811PHENOTYPE" | "craniocervical region phenotype" |
Thanks! :) No rush, but if you can point me to the code that ensures that both FBbt:00000004PHENOTYPE & HP:0000234 are subclasses of http://purl.obolibrary.org/obo/UBERON_0007811PHENOTYPE; that would help. this is really a big surprise to me!
This is coming from scigraph which creates convenience edges during load time. It may not be reflective of what is in owlsim. The owlsim makefile uses http://archive.monarchinitiative.org/201902/owl/metazoa.owl, see https://github.com/monarch-initiative/monarch-owlsim-data/blob/master/server/Makefile
It's also possible the jaccard sim implementation includes the parent UPHENO phenotype class and that's what we're seeing
... aaaaand? I'm at the edge of my seat. What happens next? 😃
From my standpoint, we will do the following: whenever Monarch is slurping post composed phenotypes, we have a separate repo (like in the case of ZP) that maps the post composed annotations into a pre-coordinated vocabulary in a standard, transparent way. Ideally, this would happen in collaboration with the mods, but in cases where this is not possible, we need to think about IRIs and membership in uPheno. For example, FlyBase will curate an abnormal head as FBBT:001 (Head). Now this is not a phenotype term, it is an anatomical entity. So to be clean with our conceptual model, we have so far generated these ominous FBBT:001PHENOTYPE classes. To provide the classification, someone, probably @cmungall , created this ontology here. This is roughly what should have happened, but the above ontology does not use EQ; it will therefore not neatly fall into our general framework. So the question we need to answer now is:
1) Lets say FlyBase does not have an interest in integrating a vocabulary on abnormal anatomical entities, will we provide a stable id that we intent people to be used? So for example, single cell atlas may want to record an abnormal head (fly) phenotype; is it our intention they use our term? 2) our PHENOTYPE classes have been out in the wild now for a while.. :/ Should we continue to use this URI scheme (adding the word PHENOTYPE at the end) to stay backwards compatible? Do we mass deprecate? Do we just silently make them dissappear (as they were never meant to be used in the first place)?
We determined that this is coming from uPheno and not dipper, so proposing we close or move the ticket.
Ok. But the problem persists, so I'd like to still have an open ticket on the UI Project. Dear @kshefchek @matentzn @kltm - could one of you please open a new ticket in uPheno that includes this information, and link here as well? Thank you!
Sure thing, I've made a ticket here - https://github.com/obophenotype/upheno/issues/521
From a UI perspective, we could have a blacklist of curie prefixes where we know the purls go no where, and not link out to them.
EDIT: actually this wouldn't work here, because the prefix is GO
We're actually doing better on beta than production, which has two broken URLs, one which 404s, https://monarchinitiative.org/phenotype/GO:0030424PHENOTYPE
How to reproduce
Expected results: I'm guessing that there should be no "PHENOTYPE" parallel construction for the GO available to the user?