monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
235 stars 53 forks source link

Incorporate ICDO mappings from progenetix #1148

Open cmungall opened 4 years ago

cmungall commented 4 years ago

Currently we have ICDO mappings souirced fro NCIT, of the form `ICDO:nnnn/n'

progenetix has more complete mappings https://github.com/progenetix/ICDOntologies/tree/master/current

TODO: determine consistency of these two

We will likely want to map each cancer term in mondo to a pair icdot/icdom as in the above.

E.g.

https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-84303%2Cicdot-C34.9.yaml

equivalents:
- label: Lung mucoepidermoid carcinoma
  id: ncit:C45544
examples:
- labels: Lung mucoepidermoid carcinoma [cell line EPLC-272H]
input:
- label: Mucoepidermoid carcinoma
  id: icdom-84303
- label: Lung and bronchus
  id: icdot-C34.9

This could be formalized by an equivalence axiom between precomposed mondo class and class expression icdom AND disease-has-location some icdot

However, for convenience we may want to make simple xrefs to a conjoined string and have this resolve to URLs like https://progenetix.org/api/ncitcodes/icdom-85003,icdot-C50/

todo: determine license of progenetic mappings

cc @mbaudis

Map each mondo class with an ncit equivalent to a icdot/icdom combo, see https://github.com/progenetix/ICDOntologies/tree/master/current

What is the license of these mappings?

mbaudis commented 4 years ago

Nice summary! Regarding the licensing: The primary icdo[mt]-.... codes are derived/correspond to ICD-O 3 terms. Current link (IACR). Additional information (disease interpretation / ICD-O code assignments) come from the WHO Classification of Tumours (WHO Blue Books).

Our mappings can get any open terms.

mbaudis commented 4 years ago

Pinging @paulacarrio who did most of the mappings.

cmungall commented 4 years ago

We have identified cases where the mapping is too general e.g https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-80003%2Cicdot-C44.9.yaml

@nicolevasilevsky will examine all 512 mappings. We will first need to turn this into a google sheet. @paulacarrio has scripts in the tools folder to turn into ods, but we could also do a quick yaml2tsv and load that into a sheet

nicolevasilevsky commented 4 years ago

Note mappings as:

cmungall commented 4 years ago

I made a spreadsheet:

https://docs.google.com/spreadsheets/d/1_6ZX715m3A0Iy9pmPjD_7OtmAAIuxAutFZEuA5XXNdY/edit#gid=356020133

can you move to mondo drive nicole?

this was the script I used:

import yaml
import click

@click.command()
@click.argument('input', nargs=-1)
def create_tsv(input):
    for f in input:
        #print(f)
        with open(f, 'r') as f:
            obj = yaml.load(f, Loader=yaml.SafeLoader)
            if len(obj['equivalents']) > 1:
                raise "too many equivalents"
            x = obj['equivalents'][0]
            icdom = ('_','_')
            icdot = ('_','_')
            for i in obj['input']:
                label = i['label']
                id = i['id']
                if id.startswith('icdom'):
                    icdom = (id, label)
                elif id.startswith('icdot'):
                    icdot = (id, label)
                else:
                    raise "unknown ID type"
                vals = [icdom[0], icdom[1], icdot[0], icdot[1], x['id'], x['label']]
            print("\t".join(vals))
            #print(obj)

if __name__ == "__main__":
    create_tsv()
nicolevasilevsky commented 4 years ago

@mbaudis I am half way done with reviewing the mappings. I'll keep going but do you want to review what I have done so far?

You could highlight cells with any issues or create a new column with notes for issues

mbaudis commented 4 years ago

@nicolevasilevsky Great - I had done some "random" annotations but will switch to systematic notes.

mbaudis commented 4 years ago

@nicolevasilevsky line 140 ... break for today; pls. have a look.

nicolevasilevsky commented 4 years ago

@cmungall said this should be lower priority for me to review these mappings. @mbaudis do you have enough feedback from me, at the moment?

Do you know how to make new term requests to NCIt?

mbaudis commented 3 years ago

@nicolevasilevsky AFAIK @paulacarrio & @qingyao have submitted a list of term requests. But we have now a icdom + icdot <-> NCIt service online, w/ GH repo etc.:

There is also some UBERON <-> ICD-O topography.

We actually would much appreciate if:


nicolevasilevsky commented 3 years ago

is there any action still needed for this?

nicolevasilevsky commented 2 years ago

I am going to close this as there hasn't been any response in a couple months. Please reopen if still needed, thanks!

mbaudis commented 2 years ago

@nicolevasilevsky @cmungall Stale item, but the issue still remains that there is no good representation of ICDO T+M pairs w/ the corresponding NCIT terms.

NCIT now covers most of our combinations (+1), but a direct mapping does not exist anywhere beyond our resource (?). So no idea how to go about this; as indicated, happy to provide/extend the mappings if someone has a way to integrate them in a lookup (?) service or annotation for term equivalence.

nicolevasilevsky commented 2 years ago

I'll bring this up with Chris at one of our Mondo calls in the new year.

mbaudis commented 2 years ago

Great - thanks! Happy to get looped in if needed.

nicolevasilevsky commented 2 years ago

Hi @mbaudis I talked to Chris about this and he said we could probably an OWL version of ICD-O with Koza or LinkML (similar to the way we did this with monochrome) and host it on OLS. We have a lot of other competing priorities though, so I'll come back to this in a couple months and see if our development team can work on this. Thanks!

mbaudis commented 2 years ago

@nicolevasilevsky Great - please keep me posted; I'd like to help... And preferably LinkML :-)

mellybelly commented 2 years ago

see also https://github.com/mapping-commons/sssom/issues/222 - can we bump the priority on this?

nicolevasilevsky commented 2 years ago

@mellybelly I'll bring this up with Nico on a future call.

nicolevasilevsky commented 2 years ago

@mellybelly should we bump the priority on this in favor of other work, like ICD10 mappings, NCIT mappings, MedGen mappings, etc?

If @mbaudis can provide a sssom mapping file instead of the spreadsheet, we can easily add these into Mondo. However, if we need to review all the mappings and create a file ourselves, it will be a big lift and we'll to need deprioritize other work.

mbaudis commented 2 years ago

@nicolevasilevsky I don't have the resources to provide a sssom'd version; but more than that I wouldn't know how to express the ICD-O pairs correctly. Internally we just concatenate them to get unique keys (icdom-85032::icdot-C50.9 ...) but is there a way to do this in the sssom schema? (real not my area...)

Also: IMO many of the mappings could be done better w/ the current version of NCIt.

So:

nicolevasilevsky commented 2 years ago

Thanks @mbaudis. Let me discuss with Nico and we can come up with a plan to move forward. :)