Cannot reproduce WN18RR link prediction results

timhartill commented 4 years ago

Hi, thanks for publishing this great contribution!

I ran the link prediction program on WN18RR with exactly the same parameters as recommended except eval batch size 2500 instead of 5000 to fit in my 1080 ti (took 5 days but finished ok).

python3 run_bert_link_prediction.py --task_name kg
--do_train
--do_eval --do_predict --data_dir ./data/WN18RR --bert_model bert-base-cased --max_seq_length 50 --train_batch_size 32 --learning_rate 5e-5 --num_train_epochs 5.0 --output_dir ./output_WN18RR/
--gradient_accumulation_steps 1 --eval_batch_size 2500

My results: MR Hits@10 127.38 35.97

Results in your paper: MR Hits@10 97 52.4

Do you have any thoughts on what might explain the difference? Did you observe such variation running this program with different random seed?

Thanks for any ideas! Tim

yao8839836 commented 4 years ago

@timhartill

Hi，thanks for your interests in our work!

I also observed such variation, I tried 3 times on a V100 GPU with different random seeds (generated with args.seed = random.randint(1, 200)) and obtains following 3 results. I reported the best scores. The scores are sensitive to different seeds, and different machines can produce different results even with the same seed as mentioned in https://pytorch.org/docs/stable/notes/randomness.html.

1. 08/03/2019 23:26:58 - INFO - main - Hits @10: 0.4282067645181876 08/03/2019 23:26:58 - INFO - main - Mean rank: 160.35433950223356

2. 08/08/2019 21:00:23 - INFO - main - Hits @10: 0.4728781110402042 08/08/2019 21:00:23 - INFO - main - Mean rank: 97.40874282067645

08/15/2019 12:37:33 - INFO - main - Hits @10: 0.5236119974473517 08/15/2019 12:37:33 - INFO - main - Mean rank: 96.88704530950862

timhartill commented 4 years ago

Mystery solved then!

Thanks for your quick response.

yao8839836 / kg-bert

Cannot reproduce WN18RR link prediction results #9