Closed kunalr97 closed 11 months ago
For your use case, you could adapt the GGPONC sample config (remove all sabs
apart from MDR
and MDRGER
):
https://github.com/hpi-dhc/xmen/blob/main/examples/conf/ggponc.yaml
You might also start by removing semantic_groups
altogether in your case, I don't think they will have a big impact.
If you don't have gold-standard annotations for MedDRA, some other steps highlighted in the GGPONC example notebook could be helpful:
Thanks for your help and answer. I tried and it works, but its not something i was looking for. I think the image below will make it more clear. I want it to predict the code for the MedDRA(in blue) and right now its predicting the CUI(red arrow) for UMLS concepts. I don't know if its even possible using xmen, but still asking.
I see!
There is currently no built-in functionality to map from UMLS CUIs to identifiers in a UMLS source vocabulary like MedDRA.
I can suggest two ways to achieve that:
MRCONSO.RRF
) to map from CUIs to IDs in the source vocabulary as some kind of post-processing step after the pipeline has runor (what I would prefer):
xmen dict
command through a parsing script (--code my_parser.py
) that constructs the KB / jsonl file with MedDRA concept IDs. This way, you have a lot of control over the target KB. For instance, you could have all concept IDs from MedDRA, but still use aliases from other vocabularies in the UMLS that map to the same CUI. We have done something similar for the DisTEMIST benchmark, where we link against SNOMEDCT_US, but still want to incorporate all available UMLS aliases. See: https://github.com/hpi-dhc/xmen/blob/main/examples/dicts/distemist.py
You would basically need to read MRCONSO.RRF, filter by MDR
and MDRGER
and create a concept dictionary where your keys are based on the source concept ID rather than the UMLS CUI (similar to read_snomed2cui_mapping
in the DisTEMIST script). As in the DisTEMIST example, you may then (optionally) further extend this dictionary with additional aliases.
Since I figured it might indeed be a rather common use case, I added a very simple implementation for this (in branch meddra):
https://github.com/hpi-dhc/xmen/blob/meddra/examples/dicts/umls_source.py https://github.com/hpi-dhc/xmen/blob/meddra/examples/conf/meddra_german.yaml
Then you can do xmen dict examples/conf/meddra_german.yaml --code examples/dicts/umls_source.py
and:
kb = load_kb('/path/to/my/meddra_german.jsonl')
kb.cui_to_entity['10062194']
CUI: 10062194, Name: Metastasis
Definition: None
TUI(s):
Aliases (abbreviated, total: 15):
Metastasis, Metastases NOS, Metastase, Metastase, Metastasen NNB, Secondary carcinoma, Sekundaeres Karzinom, Secondary malignant neoplasm of other specified sites, Sekundaere boesartige Neubildung sonstiger spezifischer Stellen, Secondary carcinoma (known primary)
Please let me know if this works for you.
Incredible! Thanks a ton, I tried and it works. Slowly getting to know features of xmen one by one
Thank you for the feedback, the scripts are on the main branch now.
I am already using your tool to link German Clinical Text to ICD10, OPS, ATC, etc and it is of great help so far. I am further looking to explore more ontologies and one of them would be MedDRA. I am aware that it is part of the UMLS Metathesarus group: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MDR/sourcerepresentation.html. Just couldn't figure out how to create a config file for this from scratch ? I am a bit confused on what to fill in the semantic_types and sabs section of the config file. I hope I was clear enough. In the bigger picture, I want to feed it clinical text in German and and link the medical entities to a MedDRA ontology.
Thanks in advance for your answer.