Closed martinsvat closed 2 years ago
Thanks for reporting these bugs. Sorry for the late response.
What you said is correct, sorry for the bad running experience. I have fixed them in this version.
Now, the similarity_score_mtx.npy can be obtained by removing the comment of the get_similarity()
function in get_ensembled_data.py.
Thanks again for your reporting.
Thanks for the fix! However, I'm still having issues getting the same results as those in your paper. Namely, when I want to reproduce WN18RR, I do (as stated in the readme)
head Hits @1: 0.20357370772176134
head Hits @3: 0.4572431397574984
head Hits @10: 0.6726228462029356
headMean rank: 57.47383535417996
headMean reciprocal rank: 0.3644130875627938
and
---------tail, test, lr=0.001, ep=3.0, nt=5, margin=0.6, bs=32 feature=mix metric ----------
tail Hits @1: 0.2906828334396937
tail Hits @3: 0.5370134014039566
tail Hits @10: 0.7565411614550096
tailMean rank: 54.801850670070195
tailMean reciprocal rank: 0.44718844813012876
Now, taking average of these, e.g. hits1 is not the same as in the paper: see, (tail-hits1 + head-hits1) / 2 = 0.24712827058072752, meanwhile there is 0.459 in the paper (table 4).
Please, can you provide more information (e.g. hyperparameter setup for RotatE's model) to get the exact results as you have in the paper? Or, did I use the commands incorrectly somewhere? (For example, not executing the last one, ensemble/run.py, twice with different modes, but the first time with train and the second time with --init?) I'd like to use StAR model but I need to have correct results for the start.
best Martin
Your running commands seems correct. And I just use the official hyperparameter of RotatE to train the model on WN18RR.
The data about the trained model reported in paper was lost. I will reproduce the results recently and then tell you the results.
By the way, how about your obtained results of StAR and RotatE on WN18RR?
By the way, how about your obtained results of StAR and RotatE on WN18RR?
Final lines from train.log
Valid MRR at step 79999: 0.478470
Valid MR at step 79999: 3284.908372
Valid HITS@1 at step 79999: 0.432597
Valid HITS@3 at step 79999: 0.493243
Valid HITS@10 at step 79999: 0.571523
Evaluating on Test Dataset...
...
Test MRR at step 79999: 0.476083
Test MR at step 79999: 3369.924059
Test HITS@1 at step 79999: 0.428207
Test HITS@3 at step 79999: 0.494416
Test HITS@10 at step 79999: 0.571315
And the results of StAR?
Sorry, and thanks for help. Here is content of _WN18RR_roberta-large/link_predictionmetrics.txt
Hits left @1: 0.20261646458200383
Hits right @1: 0.2782386726228462
###Hits @1: 0.240427568602425
Hits left @3: 0.45213784301212506
Hits right @3: 0.5188257817485641
###Hits @3: 0.4854818123803446
Hits left @10: 0.6668793873643906
Hits right @10: 0.7479259731971921
###Hits @10: 0.7074026802807913
Mean rank left: 57.20835992342055
Mean rank right: 53.99298021697511
###Mean rank: 55.60067007019783
Mean reciprocal rank left: 0.3616734820860267
Mean reciprocal rank right: 0.4341342479524534
###Mean reciprocal rank: 0.39790386501924
Which seems quite similar to what's in table 4. RotatE's results are also quite similar to what's in table 4.
Got it. I will try to find out the reason and tell you later.
Hi, any success reproducing the results?
Meanwhile, I have another question regarding the ensembling model. It is learned twice for tail and head prediction tasks, right? So, if one has a little bit different prediction task to predict the value of a triple (e1, r2, e3), he has to average outputs for both queries (e1, r2, e3) to head-learned and tail-learned model, right?
thx
Sorry for the very late response.
There were some bugs in the codes and commands before. Thanks for reporting. I have updated this repo. To reproduce the ensemble results, please follow the new version and rerun the last command in 5.1:
CUDA_VISIBLE_DEVICES=3 python ./codes/run.py \
--cuda --init ./models/RotatE_wn18rr_0 \
--test_batch_size 16 \
--star_info_path /home/wangbo/workspace/StAR_KGC-master/StAR/result/WN18RR_roberta-large \
--get_scores --get_model_dataset
By the way, the performance of ensemble model may not be stable enough. For the command in 5.2, you can just use ‘add’ for –feature_method and do_prediction only to get a suboptimal result which is corresponding to the StAR (Ensemble) in Table 4 of the paper.
For your second question, I think what you said is a way to solve the triple classification task. Or you can modify the code to adapt to the task. You can refer to the code of KG-BERT who implements triple classification.
Sorry. I fixed a small bug just now. If you have followed the last version, the generated files are saved in the wrong paths and names. You can move the file to the correct directory.
Hi guys, good work, but I struggle a bit with reproducing your results. It's nothing serious, but it would be better to have clone-and-use approach. So far I encounter these little obstacles:
Please, can you provide a fix for the _similarity_scoremtx.npy missing file? I could simply remove the commented line but there is no mention of how to use _get_ensembleddata.py.
best Martin