Open cmungall opened 4 years ago
Nice summary! Regarding the licensing: The primary icdo[mt]-....
codes are derived/correspond to ICD-O 3 terms. Current link (IACR). Additional information (disease interpretation / ICD-O code assignments) come from the WHO Classification of Tumours (WHO Blue Books).
Our mappings can get any open terms.
Pinging @paulacarrio who did most of the mappings.
We have identified cases where the mapping is too general e.g https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-80003%2Cicdot-C44.9.yaml
@nicolevasilevsky will examine all 512 mappings. We will first need to turn this into a google sheet. @paulacarrio has scripts in the tools folder to turn into ods, but we could also do a quick yaml2tsv and load that into a sheet
Note mappings as:
I made a spreadsheet:
can you move to mondo drive nicole?
this was the script I used:
import yaml
import click
@click.command()
@click.argument('input', nargs=-1)
def create_tsv(input):
for f in input:
#print(f)
with open(f, 'r') as f:
obj = yaml.load(f, Loader=yaml.SafeLoader)
if len(obj['equivalents']) > 1:
raise "too many equivalents"
x = obj['equivalents'][0]
icdom = ('_','_')
icdot = ('_','_')
for i in obj['input']:
label = i['label']
id = i['id']
if id.startswith('icdom'):
icdom = (id, label)
elif id.startswith('icdot'):
icdot = (id, label)
else:
raise "unknown ID type"
vals = [icdom[0], icdom[1], icdot[0], icdot[1], x['id'], x['label']]
print("\t".join(vals))
#print(obj)
if __name__ == "__main__":
create_tsv()
@mbaudis I am half way done with reviewing the mappings. I'll keep going but do you want to review what I have done so far?
You could highlight cells with any issues or create a new column with notes for issues
@nicolevasilevsky Great - I had done some "random" annotations but will switch to systematic notes.
@nicolevasilevsky line 140 ... break for today; pls. have a look.
@cmungall said this should be lower priority for me to review these mappings. @mbaudis do you have enough feedback from me, at the moment?
Do you know how to make new term requests to NCIt?
@nicolevasilevsky AFAIK @paulacarrio & @qingyao have submitted a list of term requests. But we have now a icdom + icdot <-> NCIt service online, w/ GH repo etc.:
There is also some UBERON <-> ICD-O topography.
We actually would much appreciate if:
is there any action still needed for this?
I am going to close this as there hasn't been any response in a couple months. Please reopen if still needed, thanks!
@nicolevasilevsky @cmungall Stale item, but the issue still remains that there is no good representation of ICDO T+M pairs w/ the corresponding NCIT terms.
NCIT now covers most of our combinations (+1), but a direct mapping does not exist anywhere beyond our resource (?). So no idea how to go about this; as indicated, happy to provide/extend the mappings if someone has a way to integrate them in a lookup (?) service or annotation for term equivalence.
I'll bring this up with Chris at one of our Mondo calls in the new year.
Great - thanks! Happy to get looped in if needed.
Hi @mbaudis I talked to Chris about this and he said we could probably an OWL version of ICD-O with Koza or LinkML (similar to the way we did this with monochrome) and host it on OLS. We have a lot of other competing priorities though, so I'll come back to this in a couple months and see if our development team can work on this. Thanks!
@nicolevasilevsky Great - please keep me posted; I'd like to help... And preferably LinkML :-)
see also https://github.com/mapping-commons/sssom/issues/222 - can we bump the priority on this?
@mellybelly I'll bring this up with Nico on a future call.
@mellybelly should we bump the priority on this in favor of other work, like ICD10 mappings, NCIT mappings, MedGen mappings, etc?
If @mbaudis can provide a sssom mapping file instead of the spreadsheet, we can easily add these into Mondo. However, if we need to review all the mappings and create a file ourselves, it will be a big lift and we'll to need deprioritize other work.
@nicolevasilevsky I don't have the resources to provide a sssom'd version; but more than that I wouldn't know how to express the ICD-O pairs correctly. Internally we just concatenate them to get unique keys (icdom-85032::icdot-C50.9
...) but is there a way to do this in the sssom schema? (real not my area...)
Also: IMO many of the mappings could be done better w/ the current version of NCIt.
So:
¯\_(ツ)_/¯
Thanks @mbaudis. Let me discuss with Nico and we can come up with a plan to move forward. :)
Currently we have ICDO mappings souirced fro NCIT, of the form `ICDO:nnnn/n'
progenetix has more complete mappings https://github.com/progenetix/ICDOntologies/tree/master/current
TODO: determine consistency of these two
We will likely want to map each cancer term in mondo to a pair icdot/icdom as in the above.
E.g.
https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-84303%2Cicdot-C34.9.yaml
This could be formalized by an equivalence axiom between precomposed mondo class and class expression
icdom AND disease-has-location some icdot
However, for convenience we may want to make simple xrefs to a conjoined string and have this resolve to URLs like https://progenetix.org/api/ncitcodes/icdom-85003,icdot-C50/
todo: determine license of progenetic mappings
cc @mbaudis
Map each mondo class with an ncit equivalent to a icdot/icdom combo, see https://github.com/progenetix/ICDOntologies/tree/master/current
What is the license of these mappings?