monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

HPOA onset and frequencyOfPhenotype predicates #372

Open TomConlin opened 8 years ago

TomConlin commented 8 years ago

@mbrush

hpoa.ttl (exclusively) uses this unresolvable @base iri as a predicate 49,148 times:

https://monarchinitiative.org/frequencyOfPhenotype

followed by object literals such as:

  17,124 "hallmark" .
  13,350 "occasional" .
  13,249 "typical" .
   3,289 "rare" .
     163 "common" .
     159 "frequent" .
     157 "Occasional" .
      91 "Rare" .
      70 "7.5000 %" .
      65 "obligate" .
      51 "2/2" .
      42 "3/3" .
      40 "1/3" .
...

there is a term RO_0003306 "contributes to frequency of condition" which may suffice instead.

hpoa.ttl also (exclusively) uses this unresolvable @base iri as a predicate 438 times

https://monarchinitiative.org/onset

in the form:

<monarch:association>  <:onset> <HP:term> .

where the HP: OBJECT terms are

    107 <http://purl.obolibrary.org/obo/HP_0003577> ! Congenital onset
     86 <http://purl.obolibrary.org/obo/HP_0003593> ! Infantile onset
     65 <http://purl.obolibrary.org/obo/HP_0011463> ! Childhood onset
     64 <http://purl.obolibrary.org/obo/HP_0003623> ! Neonatal onset
     39 <http://purl.obolibrary.org/obo/HP_0003621> ! Juvenile onset
     37 <http://purl.obolibrary.org/obo/HP_0003581>
     32 <http://purl.obolibrary.org/obo/HP_0003584>
      7 <http://purl.obolibrary.org/obo/HP_0011462>
      1 <http://purl.obolibrary.org/obo/HP_0003596>

There are nearly 400 terms including th word "onset" in ontobee. perhaps one of them could work here.

the <OBJECT> in these cases seem to me to be doing a better job describing the type of relationship than the predicate

maybe

<monarch:association>  <HP:term>  <disease/phenotype> 

would be closer to the data.

drseb commented 8 years ago

Thanks for pointing these out.

I am going to change the data in the next months. I would like to make sure you have some mechanism to detect when e.g. 'hallmark' is not called like this anymore.

Also, where do these come from:

 37 <http://purl.obolibrary.org/obo/HP_0003581>
 32 <http://purl.obolibrary.org/obo/HP_0003584>
  7 <http://purl.obolibrary.org/obo/HP_0011462>
  1 <http://purl.obolibrary.org/obo/HP_0003596>
TomConlin commented 8 years ago

If you are changing "hallmark" to a different word we are passing along as a literal it should not matter. If you are changing the column hallmark is currently in to a persistent resolvable identifier it would be good to get a heads up and samples sooner than later.

where do these (objects) come from...
taking the last one, the identifier comes from phenotype_annotation.tab file at:
http://compbio.charite.de/jenkins/job/hpo.annotations/lastStableBuild/artifact/misc/

grep  0003596 phenotype_annotation.tab 
OMIM    103200  %103200 ADIPOSIS DOLOROSA;;DERCUM DISEASE       HP:0003596  OMIM:103200 IEA             C   ADIPOSALGIA|ADIPOSE TISSUE RHEUMATISM|ADIPOSIS DOLOROSA|DERCUM'S DISEASE|LIPOMATOSIS DOLOROSA|NEUROLIPOMATOSIS|http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=36397   2009.02.17  HPO
OMIM    175800  POROKERATOSIS OF MIBELLI        HP:0003596  OMIM:175800 TAS             C       2009.02.17  HPO:probinson
OMIM    605543  PARKINSON DISEASE 4, AUTOSOMAL DOMINANT LEWY BODY       HP:0003596  OMIM:605543 IEA             C       2009.02.17  HPO:probinson
OMIM    606798  BLEPHAROSPASM, BENIGN ESSENTIAL     HP:0003596  OMIM:606798 TAS             C       2009.02.17  HPO:probinson
OMIM    606889  ALZHEIMER DISEASE 4     HP:0003596  OMIM:606889 TAS             C       2012.07.16  HPO:probinson
OMIM    615780  #615780 RETINITIS PIGMENTOSA 69; RP69       HP:0000550  OMIM:615780 TAS HP:0003596          O       2015.07.19  HPO:probinson

the prefix HP: maps to http://purl.obolibrary.org/obo/HP_ in dipper/curie_map.yaml https://github.com/monarch-initiative/dipper/blob/master/dipper/curie_map.yaml#L37

dipper assembles a triple asserting that a subject g2p association has predicate "onset" to a HP: term.

I do not see a cmap for hpoa in the docs folder https://github.com/monarch-initiative/dipper/tree/master/docs but there is a generated rendering of the model the ingest produces in http://data.monarchinitiative.org/dot/hpoa.dot

(the dot file is from the previous release as i am still working on the ones for the current release which is what surfaces these sorts of things)

drseb commented 8 years ago

where do these (objects) come from...

I was wondering why they don't have labels as the other have. Do you know that?

drseb commented 8 years ago

I do not see a cmap for hpoa in the docs folder https://github.com/monarch-initiative/dipper/tree/master/docs

Can you give me some background on these cmap files? Is there documentation about cmap and the usage in monarch?

TomConlin commented 8 years ago

TC> where do these (objects) come from... SK> I was wondering why they don't have labels as the other have. Do you know that?

likely red herring.
they would, I just did not look them up and add them beyond the the top few in the list

TomConlin commented 8 years ago

cmaps files are how @mbrush introduced me to the monarch semantic models and (currently) serve as my primary reference when I am writing or editing an ingest script.

Although cmaps appear to be adequate for ontologist communication, they have some deficiencies from a development perspective and alternatives are welcome. One suggestion from @cmungall is SHACL https://github.com/monarch-initiative/dipper/issues/265

in general
cmaps are Concept Maps https://en.wikipedia.org/wiki/Concept_map

in specific
cmap is the software needed to view our files http://cmap.ihmc.us/

kshefchek commented 5 years ago

@TomConlin can we close?

TomConlin commented 5 years ago

No. The only change ids that coughsomeonecough added yet another pseudo predicate that requires a proper ontological term that no one seems ready to make.

  79454 <https://monarchinitiative.org/frequencyOfPhenotype>
    135 <https://monarchinitiative.org/has_sex_specificity>
    729 <https://monarchinitiative.org/onset>
kshefchek commented 5 years ago

At least for the original ticket we could consider: frequency of phenotype: http://semanticscience.org/resource/SIO_000900 age of onset: http://purl.obolibrary.org/obo/mondo#has_onset