shijx12 / TransferNet

Pytorch implementation of EMNLP 2021 paper "TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph "
63 stars 18 forks source link

Results on WebquestionSP can't reimplement, 0.71 in Paper, but 0.608 I run #3

Open Xie-Minghui opened 3 years ago

Xie-Minghui commented 3 years ago

I run the code using the same random seed. (I don't change your code), the final results are as follows: image 1-hop accuracy is 0.732, 2-hop is 0.445, the total accuracy is 0.608. But the result of WebquestionSP in your paper is 0.714, which is much bigger than 0.608. I worder how you train your model, or this is a mistake result of your paper

shijx12 commented 3 years ago

image image

This is our checkpoint information. It seems that your loss is higher than ours. I do not know whether it is because some inconsistency of BERT initialization or huggingface version. Actually, we have observed unexpected performance drop with different BERT initialization.

Xie-Minghui commented 3 years ago

I also run the code of ComplexWebQuestion, the results are val: 0.49,test: 0.45. The result in paper is 48.7. When you run you code, what the results on validation set and test set are? Thank you

Xie-Minghui commented 3 years ago

image image

This is our checkpoint information. It seems that your loss is higher than ours. I do not know whether it is because some inconsistency of BERT initialization or huggingface version. Actually, we have observed unexpected performance drop with different BERT initialization.

How you initialization bert? I just use the same code as yours

shijx12 commented 3 years ago

I also run the code of ComplexWebQuestion, the results are val: 0.49,test: 0.45. The result in paper is 48.7. When you run you code, what the results on validation set and test set are? Thank you

We get 48.6 val accuracy. We report the results on validation set for CompWebQ in the paper.

shijx12 commented 3 years ago

How you initialization bert? I just use the same code as yours

We download the bert weights from Huggingface. However, Huggingface upgrades their API and model weights, which may cause some unexpected performance issues. We have observed it several times recently. We are uploading our checkpoint and will share the link to you soon.

ShulinCao commented 2 years ago

Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.

Xie-Minghui commented 2 years ago

Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.

您好,请问可以上传您使用的Bert模型文件(pt和vocab.txt,config.json等文件),或者Bert版本也行?

shijx12 commented 2 years ago

我们实验基本是2020年9到11月份做的,后来那台机器重装,我们只将代码和实验结果备份了出来,没有保留 ~/.cache 之类的。Bert 版本应该是某一个 2020 年的 Transformer 版本,我们后面尝试一下看能否找回当时的版本。

Huiopfsdfsdf commented 1 year ago

Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.

Hello! Very disturbing, is there a CWQ log?