phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
75 stars 28 forks source link

OntologyClass for ICD-O? #305

Open mbaudis opened 3 years ago

mbaudis commented 3 years ago

The Disease object references OntologyClass for ICD-O values used for primary_site annotations.

Which ICD-O ontology implementation does this refer to? While the codes themselves are well defined/known, It is no clear which (public) source for CURIEs would be recommended (in fact, exists).

pnrobinson commented 3 years ago

Thanks for pointing this out. The documentation in the protobuf file needs to be revised to say "such as" -- in no place do we want to proscribe a specific ontology. I agree it is hard to access ICD-O terms, but this is one option (https://apps.who.int/iris/handle/10665/96612). It would also be acceptable to use UBERON or NCIT terms here. @julesjacobsen

mbaudis commented 3 years ago

@pnrobinson We have coded to ICD-O for ~20ys; and my original interest in ontologies w/ CURIEs started from the lack of those for ICD-O, when writing the OntologyTerm use into the GAGH metadata schema.

There is a representation in SNOMED, but this isn't correct & also not OA. So ICD-O became the driver for us to code the ICD-O (morphology + topography) doublets to NCIt while using a modified code representation:

We've worked on getting this into MONDO last year (w/ @cmungall and @nicolevasilevsky).

Still, for practical purposes (i.e. talking to pathologists ...) we still code ICD-O and NCIt in parallel.

mbaudis commented 3 years ago

@pnrobinson @julesjacobsen As much as I love ICD-O, I would drop it here since it requires coding of 2 arms & does not exist in a "ontologized" form. You can document/point out that when using standards like the current ICD-O the codes should be converted to a suitable ontologyClass.

julesjacobsen commented 3 years ago

@pnrobinson @mbaudis Can you recommend an NCIT root term for this? NCIT:C12219?

mbaudis commented 3 years ago

@julesjacobsen For Cancer it is NCIT:C3262 (I don't have to look this up :-)

I.e. "Neoplasm" root term, which covers also benign neoplasms.

julesjacobsen commented 3 years ago

Thank @mbaudis. However, isn't Neoplasm better placed in the Biosample.histological diagnosis? The Disease.primary_site ought to be an anatomy term. In the case below this should be cervix uteri - NCIT:C12311 == UBERON:0000002

[
  {
    "id": "NCIT:C4028",
    "label": "Cervical Squamous Cell Carcinoma, Not Otherwise Specified"
  },
  {
      "id": "icdom-80703",
      "label": "Squamous cell carcinoma, NOS"
  },
  {
      "id": "icdot-C53.9",
      "label": "cervix uteri"
  }
],
[
  {
      "id": "NCIT:C4029",
      "label": "Cervical Adenocarcinoma"
  },
  {
      "id": "icdom-81403",
      "label": "Adenocarcinoma, NOS"
  },
  {
      "id": "icdot-C53.9",
      "label": "cervix uteri"
  }
]
mbaudis commented 3 years ago

@julesjacobsen Correct - my original comment led then a to a more general drift... We're using in parallel 1x NCIt neoplasm <=> 2x ICD-O, and recode ICD-O Topo to UBERON. So, yes, NCIt Neoplasm subtree for Biosample.histological_diagnosis (we actually use it this way), and primary_site UBERON or a corresponding code.

Just for emphasis: For the cancer use, I still think that the ICD-O Topo coding is in principle the best match and has a widespread use. It is just that I'm not aware of a representation in a well structured ontology with CURIEs, to point to. This may (have) change(d) - I would be glad ...

mellybelly commented 3 years ago

@balhoff can you comment