hpi-dhc / xmen

✖️MEN - A Modular Toolkit for Cross-Lingual Medical Entity Normalization
Apache License 2.0
22 stars 7 forks source link

LLM-based entity simplification for BRONCO Data #36

Open kunalr97 opened 5 days ago

kunalr97 commented 5 days ago

Hi,

I wanted to extend my candidate generation approach on the BRONCO Dataset (as show in the example https://github.com/hpi-dhc/xmen/blob/main/examples/01_BRONCO_German.ipynb with the LLM-based entity simplification as show in your extended paper.

I came across the repo here but it it shows an example only for the SympTEMIST Task. https://github.com/hpi-dhc/symptemist/blob/main/1_LLM_Simplification.ipynb

Would be interesting to see how this approach might work for the BRONCO Dataset and compare to the previous baseline ?

Thanks in advance.

Best, Kunal

phlobo commented 4 days ago

Thank you for your question!

Basically https://github.com/hpi-dhc/symptemist/blob/main/1_LLM_Simplification.ipynb inserts an additional step after candidate generation.

So you should be able to easily adapt the SympTEMIST notebook by:

I would like to point out that SympTEMIST benefited from this approach a lot, as it has many very long mention spans, that are hard to link. Mentions in BRONCO are much shorter on average, so you might have to think about ways in which rephrasing would benefit candidate generation performance (and adapt the few shot examples accordingly). I assume there is quite a lot of potential for treatments in BRONCO (rephrasing mentions to make them more similar to terms in OPS), but maybe less so for diagnoses and medications, where candidate generation recall is already quite high.

kunalr97 commented 4 days ago

Thank you so much for your reply and insights! I will try it out ASAP.