Open Xie-Minghui opened 3 years ago
This is our checkpoint information. It seems that your loss is higher than ours. I do not know whether it is because some inconsistency of BERT initialization or huggingface version. Actually, we have observed unexpected performance drop with different BERT initialization.
I also run the code of ComplexWebQuestion, the results are val: 0.49,test: 0.45. The result in paper is 48.7. When you run you code, what the results on validation set and test set are? Thank you
This is our checkpoint information. It seems that your loss is higher than ours. I do not know whether it is because some inconsistency of BERT initialization or huggingface version. Actually, we have observed unexpected performance drop with different BERT initialization.
How you initialization bert? I just use the same code as yours
I also run the code of ComplexWebQuestion, the results are val: 0.49,test: 0.45. The result in paper is 48.7. When you run you code, what the results on validation set and test set are? Thank you
We get 48.6 val accuracy. We report the results on validation set for CompWebQ in the paper.
How you initialization bert? I just use the same code as yours
We download the bert weights from Huggingface. However, Huggingface upgrades their API and model weights, which may cause some unexpected performance issues. We have observed it several times recently. We are uploading our checkpoint and will share the link to you soon.
Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.
Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.
您好,请问可以上传您使用的Bert模型文件(pt和vocab.txt,config.json等文件),或者Bert版本也行?
我们实验基本是2020年9到11月份做的,后来那台机器重装,我们只将代码和实验结果备份了出来,没有保留 ~/.cache 之类的。Bert 版本应该是某一个 2020 年的 Transformer 版本,我们后面尝试一下看能否找回当时的版本。
Please refer to https://cloud.tsinghua.edu.cn/f/786b9853c1d840578025/?dl=1 for our log and checkpoint.
Hello! Very disturbing, is there a CWQ log?
I run the code using the same random seed. (I don't change your code), the final results are as follows: 1-hop accuracy is 0.732, 2-hop is 0.445, the total accuracy is 0.608. But the result of WebquestionSP in your paper is 0.714, which is much bigger than 0.608. I worder how you train your model, or this is a mistake result of your paper