rahuln / lm-bio-kgc

Using pretrained language models for biomedical knowledge graph completion.
46 stars 7 forks source link

Scripts for inductive KG completion experiments? #5

Open moonliii opened 4 months ago

moonliii commented 4 months ago

Hi, I'm trying to reproduce the results in your paper, however I couldn't find the scripts for inductive KG completion experiments tagged with "NN-ComplEx, frozen LM" or "NN-ComplEx, fine-tuned"?

rahuln commented 4 months ago

You're right, it looks like there isn't a single script you can run to reproduce those exact results - sorry about that! In order to reproduce those results, you'll first need to train a KGE model (in this case, ComplEx) on the desired dataset. Then you'll need to run script/compute_entity_encodings.py to compute and save embeddings of the text for all entities from your desired dataset using a specified pretrained LM (either the base PubMedBERT or the fine-tuned KG-PubMedBERT, depending on which numbers you're trying to reproduce). Finally, you would run script/compute_kge_scores.py to do the evaluation, pointing to the directory of your saved ComplEx model parameters as well as using the --text_emb_file argument to point to the entity embeddings file that you generated in the previous step and setting --mode=max to specify that you want to use the maximally-similar embedding ("nearest neighbor") to impute each missing ComplEx entity embedding. Hope this helps!