monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

adding cda mapping code #67

Closed pnrobinson closed 8 months ago

pnrobinson commented 8 months ago

@justaddcoffee @ielis @msierk Here is a prototype for code to map the disease diagnoses that we have in the CDA https://github.com/monarch-initiative/oncoexporter/blob/ncit_map/notebooks/map_cda_data_to_canonical_ncit.ipynb This is not very pretty but it should work and enable us to encode several cohorts without lots of biocuration. There is also a fallback for rare or new diagnoses. If we like this, then I think I can map at least ten cancer types for the prototype, it should not be more than one hours work per type.

pnrobinson commented 8 months ago

I think this is a duplicate of #54

msierk commented 8 months ago

@pnrobinson This file in op_diagnosis_mapper.py does not exist:

This file lives at src/oncoexporter/ncit_mapping_files/cda_to_ncit_map.tsv

pnrobinson commented 8 months ago

@msierk sorry, my dumb, I think *.tsv was in the gitignore so I did not see this! Added now

pnrobinson commented 8 months ago

@ielis @justaddcoffee I have added a few fields to the OpDiseaseMapper, and think this is ready to merge. There are some test failures, but I think it would be better to merge and then correct -- I am not sure what else is going on and think that mainly the API has changed a bit. For now, I am omitting the ICO-O, because it seems to a field that would actually belong in biosample, and I think we need to ask the CDA team.