Linking to MedDRA Ontology

kunalr97 commented 11 months ago

I am already using your tool to link German Clinical Text to ICD10, OPS, ATC, etc and it is of great help so far. I am further looking to explore more ontologies and one of them would be MedDRA. I am aware that it is part of the UMLS Metathesarus group: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MDR/sourcerepresentation.html. Just couldn't figure out how to create a config file for this from scratch ? I am a bit confused on what to fill in the semantic_types and sabs section of the config file. I hope I was clear enough. In the bigger picture, I want to feed it clinical text in German and and link the medical entities to a MedDRA ontology.

Thanks in advance for your answer.

phlobo commented 11 months ago

For your use case, you could adapt the GGPONC sample config (remove all sabs apart from MDR and MDRGER): https://github.com/hpi-dhc/xmen/blob/main/examples/conf/ggponc.yaml

You might also start by removing semantic_groups altogether in your case, I don't think they will have a big impact.

If you don't have gold-standard annotations for MedDRA, some other steps highlighted in the GGPONC example notebook could be helpful:

Depending on whether your input entities have semantic type information, the part about Semantic Type Filtering may or may not be helpful (I would suggest to skip it in the beginning and see if there are any errors due to a mismatch in semantic types)
Also, depending on your type of data, a pre-trained re-ranker could help (can be downloaded from HF, as shown in the NB).

kunalr97 commented 11 months ago

Thanks for your help and answer. I tried and it works, but its not something i was looking for. I think the image below will make it more clear. I want it to predict the code for the MedDRA(in blue) and right now its predicting the CUI(red arrow) for UMLS concepts. I don't know if its even possible using xmen, but still asking.

phlobo commented 11 months ago

I see!

There is currently no built-in functionality to map from UMLS CUIs to identifiers in a UMLS source vocabulary like MedDRA.

I can suggest two ways to achieve that:

Use the UMLS metathesaurus (MRCONSO.RRF) to map from CUIs to IDs in the source vocabulary as some kind of post-processing step after the pipeline has run

or (what I would prefer):

Customize the xmen dict command through a parsing script (--code my_parser.py) that constructs the KB / jsonl file with MedDRA concept IDs. This way, you have a lot of control over the target KB. For instance, you could have all concept IDs from MedDRA, but still use aliases from other vocabularies in the UMLS that map to the same CUI.

We have done something similar for the DisTEMIST benchmark, where we link against SNOMEDCT_US, but still want to incorporate all available UMLS aliases. See: https://github.com/hpi-dhc/xmen/blob/main/examples/dicts/distemist.py

You would basically need to read MRCONSO.RRF, filter by MDRand MDRGER and create a concept dictionary where your keys are based on the source concept ID rather than the UMLS CUI (similar to read_snomed2cui_mapping in the DisTEMIST script). As in the DisTEMIST example, you may then (optionally) further extend this dictionary with additional aliases.

phlobo commented 11 months ago

Since I figured it might indeed be a rather common use case, I added a very simple implementation for this (in branch meddra):

https://github.com/hpi-dhc/xmen/blob/meddra/examples/dicts/umls_source.py https://github.com/hpi-dhc/xmen/blob/meddra/examples/conf/meddra_german.yaml

Then you can do xmen dict examples/conf/meddra_german.yaml --code examples/dicts/umls_source.py

and:

kb = load_kb('/path/to/my/meddra_german.jsonl')
kb.cui_to_entity['10062194']

CUI: 10062194, Name: Metastasis
Definition: None
TUI(s): 
Aliases (abbreviated, total: 15): 
     Metastasis, Metastases NOS, Metastase, Metastase, Metastasen NNB, Secondary carcinoma, Sekundaeres Karzinom, Secondary malignant neoplasm of other specified sites, Sekundaere boesartige Neubildung sonstiger spezifischer Stellen, Secondary carcinoma (known primary)

Please let me know if this works for you.

kunalr97 commented 11 months ago

Incredible! Thanks a ton, I tried and it works. Slowly getting to know features of xmen one by one

phlobo commented 11 months ago

Thank you for the feedback, the scripts are on the main branch now.

hpi-dhc / xmen

Linking to MedDRA Ontology #24