monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Expand CDA to NCIT Mappings for 12 tissues & Enhance OpUberonMapper #79

Closed rajdeepmondaldotcom closed 5 months ago

rajdeepmondaldotcom commented 5 months ago

This introduces a significant expansion to our CDA to NCIT mappings, featuring a comprehensive addition of twelve CSV files for 12 tissues for CDA to NCIT mappings. These files cover a diverse range of tissues: bone, brain, breast, cervix, colon, heart, kidney, liver, lung, pancreas, skin, and thyroid. Please find these detailed mappings organized in src/oncoexporter/ncit_mapping_files/cda_to_ncit_tissue_wise_mappings.

Furthermore, OpUberonMapper was updated within src/oncoexporter/cda/mapper/op_uberon_mapper.py, adding new terms and mappings that translate the string representations of anatomical locations into their corrosponding UBERON terms.

Kindly please take a look,

Thanks a lot, Rajdeep

justaddcoffee commented 5 months ago

great @rajdeepmondal-el ! Maybe we can review tomorrow?

justaddcoffee commented 5 months ago

cc: @pnrobinson

rajdeepmondaldotcom commented 5 months ago

Thanks a lot for pointing it out, Agree with you @justaddcoffee I can explain why the result is that so, and some probable ways to make the predictions even more accurate. For this specific example, the primary_diagnosis_site consists of quite a lot of spurious information that might make the model a bit confused, it is trying to be more context-aware than necessary, which I can solve by assigning weights to the primary diagnosis part.

Also, there are some edge cases which i will also share.

rajdeepmondaldotcom commented 5 months ago

Thank you very much @ielis, I have merged it.