nanguoshun / LSR

Pytorch Implementation of our ACL 2020 Paper "Reasoning with Latent Structure Refinement for Document-Level Relation Extraction"
127 stars 22 forks source link

Ablation Study Result on Bert #1

Closed nttmac closed 4 years ago

nttmac commented 4 years ago

你好,请问有bert作为encoder、DocRED数据集上的消融实验结果吗,用bert作为encoder、在DocRED上能跑到多少呢,谢谢!

nanguoshun commented 4 years ago

Hi @nttmac , thanks for your attention. We didn't conduct an ablation study for BERT+LSR. The F1 with the BERT(base) encoder on the test set of DocRED is 59.05, which was given in our paper.

nttmac commented 4 years ago

请问只用Bert,不用LSR在DocRED测试集上结果是多少呢?是基于Wang hong的代码实现的吗,那个结果好像偏低(我重构代码进行复现的结果有55+)?想知道在bert上LSR能带来多大的提升,谢谢!

nanguoshun commented 4 years ago

@nttmac 谢谢关注。BERT+LSR相比Glove+LSR在DEV和TEST上大概有4个点的提升,你可以参考下Table2

xwjim commented 4 years ago

请问只用Bert,不用LSR在DocRED测试集上结果是多少呢?是基于Wang hong的代码实现的吗,那个结果好像偏低(我重构代码进行复现的结果有55+)?想知道在bert上LSR能带来多大的提升,谢谢!

请问下你训练bert大概多少个epoch能到最优,我训练bert也能到55.2左右,但是训练比较久

nttmac commented 4 years ago

请问只用Bert,不用LSR在DocRED测试集上结果是多少呢?是基于Wang hong的代码实现的吗,那个结果好像偏低(我重构代码进行复现的结果有55+)?想知道在bert上LSR能带来多大的提升,谢谢!

请问下你训练bert大概多少个epoch能到最优,我训练bert也能到55.2左右,但是训练比较久

我设置的batch_size是30,跑了250个epoch能到最优,也是55点多,是比较久。

nanguoshun commented 4 years ago

@xwjim @nttmac BERT部分我参考的是WangHong的代码,我跑出来的结果也比他Paper中的结果高一些,大概55左右。由于训练BERT非常Tricky,所以我直接使用了Wang Hong在Paper中给出的结果。LSR+GLove大概需要50个Epoch(8个小时左右),LSR+BERT大概需要80个Epoch(36个小时左右)。后边建议用英文哈,整中文其他人看不懂。

nanguoshun commented 4 years ago

I close this issue in case they are no further queries

VinnyHu commented 3 years ago

@xwjim @nttmac BERT部分我参考的是WangHong的代码,我跑出来的结果也比他Paper中的结果高一些,大概55左右。由于训练BERT非常Tricky,所以我直接使用了Wang Hong在Paper中给出的结果。LSR+GLove大概需要50个Epoch(8个小时左右),LSR+BERT大概需要80个Epoch(36个小时左右)。后边建议用英文哈,整中文其他人看不懂。

When you use bert as encoder in LSR. how much is the batch_size? I use lr as 1e-5 and batch_size as 8 on two 2080Ti. I only have F1 55.7 and loss is 0.060.

nanguoshun commented 3 years ago

hi @VinnyHu , the lr is 1e-5 and the batch size is 20. We trained the model on 3 * 24GB GPUs. Empirically, the batch size for bert-based models should be always larger than 15.