BRONCO notebook - Githubissues

All dict parsers are reviewed:

ICD10GM: now takes all the concepts from the Alphabetic file (contains aliases, etc.) and all the codes from the Systematik file that are not in alphabetic. Joining this two together, all codes present in BRONCO are covered. I also treated the "s. a." and abbreviations in square brackets as mentioned in the supplementary material from the paper.
OPS2017: same as above with alphabetic and systematic files. These aliases also have "- s." and "-s.a." formulas, although it seems a bit different from ICD10GM. Since this is not mentioned in the supplementary material of BRONCO, I did not deal with it. There are no abbreviations in brackets.
ATC2017de: I corrected some null codes that prevented the SapBERTLinker from running.

The notebook has a flag are the beginning where you can choose which entity to run for. Results seem to make sense. evaluate_at_k(ds['k5'], sapbert_linker.predict_batch(ds, batch_size=128)['k5']) we get:

TREATMENT: Perf@1 0.16420118343195267
DIAGNOSIS: Perf@1 0.6658624849215923
MEDICATION: Perf@1 0.4375

hpi-dhc / xmen

BRONCO notebook #16