ICD10GM: now takes all the concepts from the Alphabetic file (contains aliases, etc.) and all the codes from the Systematik file that are not in alphabetic. Joining this two together, all codes present in BRONCO are covered. I also treated the "s. a." and abbreviations in square brackets as mentioned in the supplementary material from the paper.
OPS2017: same as above with alphabetic and systematic files. These aliases also have "- s." and "-s.a." formulas, although it seems a bit different from ICD10GM. Since this is not mentioned in the supplementary material of BRONCO, I did not deal with it. There are no abbreviations in brackets.
ATC2017de: I corrected some null codes that prevented the SapBERTLinker from running.
The notebook has a flag are the beginning where you can choose which entity to run for. Results seem to make sense. evaluate_at_k(ds['k5'], sapbert_linker.predict_batch(ds, batch_size=128)['k5']) we get:
All dict parsers are reviewed:
The notebook has a flag are the beginning where you can choose which entity to run for. Results seem to make sense.
evaluate_at_k(ds['k5'], sapbert_linker.predict_batch(ds, batch_size=128)['k5'])
we get: