Closed kunalr97 closed 1 week ago
Thank you for your question!
Basically https://github.com/hpi-dhc/symptemist/blob/main/1_LLM_Simplification.ipynb inserts an additional step after candidate generation.
So you should be able to easily adapt the SympTEMIST notebook by:
table_file
to something like bronco[...].pkl
fixed_few_shot_examples
argument of the GPTSimplifier
. You can probably start with an empty list to have no examples at all. This might actually be the biggest lever you have for improving performance, we didn't really optimize this for SympTEMIST and it worked very well out-of-the-boxDetermine Optimal Cutoff
for now and just work with the default cutoff (0.85), which works well in most casesI would like to point out that SympTEMIST benefited from this approach a lot, as it has many very long mention spans, that are hard to link. Mentions in BRONCO are much shorter on average, so you might have to think about ways in which rephrasing would benefit candidate generation performance (and adapt the few shot examples accordingly). I assume there is quite a lot of potential for treatments in BRONCO (rephrasing mentions to make them more similar to terms in OPS), but maybe less so for diagnoses and medications, where candidate generation recall is already quite high.
Thank you so much for your reply and insights! I will try it out ASAP.
Hi,
I wanted to extend my candidate generation approach on the BRONCO Dataset (as show in the example https://github.com/hpi-dhc/xmen/blob/main/examples/01_BRONCO_German.ipynb with the LLM-based entity simplification as show in your extended paper.
I came across the repo here but it it shows an example only for the SympTEMIST Task. https://github.com/hpi-dhc/symptemist/blob/main/1_LLM_Simplification.ipynb
Would be interesting to see how this approach might work for the BRONCO Dataset and compare to the previous baseline ?
Thanks in advance.
Best, Kunal