Are there any official benchmark results of the framework?

thunlp / OpenKE

An Open-Source Package for Knowledge Embedding (KE)

3.83k stars 985 forks source link

Are there any official benchmark results of the framework? #36

Closed yuchenlin closed 6 years ago

yuchenlin commented 6 years ago

Hi,

Thanks for this wonderful framework!

I was wondering whether you have reported your official results (with hyper-parameters) on the WN18 or FB15K using this framework. I am trying to reproduce the results of TransE on the two benchmark datasets, while it seems to be hard to tune for a comparable result reported by other papers.

Thus, I was thinking that you might have some benchmark results to verify that the framework could possibly reproduce similar results.

ShulinCao commented 6 years ago

We will soon report our benchmark results.

yuchenlin commented 6 years ago

Thanks for the prompt reply! Could you please provide a typical great setting of the hyperparameters for the FB15K dataset? The settings in the example*.py are like just for testing whether the framework can run successfully.

ShulinCao commented 6 years ago

The settings of OpenKE-PyTorch branch are what you want.

yuchenlin commented 6 years ago

Thanks! I will have a try soon and post my results here later!

yuchenlin commented 6 years ago

The results of TransE on the FB15K data

overall results:
left 272.051422 0.461766 0.244469 0.081664
left(filter) 87.577713 0.726448 0.538606 0.251054
right 172.013092 0.538437 0.303804 0.104332
right(filter) 57.314960 0.779689 0.596756 0.268660

average raw MR = (272.051422 + 172.013092)/2 = 222.032
average filter MR = (87.577713 + 57.314960)/2 = 72.446 average raw Hits@10 = (0.461766 + 0.538437)/2 = 0.500 average raw Hits@10 = (0.726448 + 0.779689)/2 = 0.753

The results are quite amazing! They are much better than the results reported by the original paper by Bordes et al. (2013), which is 243, 125, 34.9, and 47.1 respectively.

Is this expected? If so, what makes the TransE in this framework work so much better than the original one? (I recall that I saw something in a paper like the implementation did some optimization. But I forgot where.)

THUCSTHanxu13 commented 6 years ago

Bordes et al. (2013) did not release their code, thus we can not know why our implementation is quite different from the results reported in the original paper = =!

yuchenlin commented 6 years ago

@THUCSTHanxu13 I see! Thanks so much! I am trying to tune the parameters to reproduce the reported performances of methods like ComplEX. Once I got some promising results, I will post the hyper-parameters here.

rlafraie commented 4 years ago

Hi @yuchenlin,

I know this issue is closed, but I'm currently looking for the optimal parameters, too. Could you therefore please share the parameters especially for the WN18 dataset. The OpenKE-PyTorch (old) branch does only include them for the FB15K dataset.

Many Thanks in Advance!