tigerchen52 / Biomedical-Entity-Linking

A Keras implementation of the AAAI21 paper "a lightweight neural model for biomedical entity linking"
Apache License 2.0
49 stars 11 forks source link

about candidate generation #7

Open acadTags opened 2 years ago

acadTags commented 2 years ago

Hi Lihu, this is very good work.

I have some questions when I try to adapt it to another dataset.

The candidate generation seems not very straightforward, like how to get the files test_candidates.txt and training_aligned_cos_with_mention_candidate.txt. I have tried to look at the generate_candidate.py, but it seems not very easy to be applied for me.

While we can try to implement to produce these files based on the descriptions in the paper, it would be more helpful if some more scripts to run candidate generations are available, in case you have them. Or would be great to know more about how to generate candidates if I missed something. Thanks.

Best regards, A

tigerchen52 commented 2 years ago

Hi Lihu, this is very good work.

I have some questions when I try to adapt it to another dataset.

The candidate generation seems not very straightforward, like how to get the files test_candidates.txt and training_aligned_cos_with_mention_candidate.txt. I have tried to look at the generate_candidate.py, but it seems not very easy to be applied for me.

While we can try to implement to produce these files based on the descriptions in the paper, it would be more helpful if some more scripts to run candidate generations are available, in case you have them. Or would be great to know more about how to generate candidates if I missed something. Thanks.

Best regards, A

Hi,

For understanding, I have added a simplified version of the python file for candidate generation source/candidate_sample.py. You can apply this script to your own dataset in order to get the *_candidate.txt.

The core function here is the find_topk_candidates(mention, entity_set, emb_matrix, topk), where the entity set is the reference KB that contains surface forms of entities, and emb_matrix is the pre-trained word embeddings.

Note that if there is an exact match for a mention, the other candidates can be filtered out, although I don't mention this procedure in the script.

Hope it helps, Lihu

acadTags commented 2 years ago

Thanks! Best regards, A