Ablation Study Result on Bert

nanguoshun / LSR

Pytorch Implementation of our ACL 2020 Paper "Reasoning with Latent Structure Refinement for Document-Level Relation Extraction"

127 stars 22 forks source link

Ablation Study Result on Bert #1

Closed nttmac closed 4 years ago

nttmac commented 4 years ago

你好，请问有bert作为encoder、DocRED数据集上的消融实验结果吗，用bert作为encoder、在DocRED上能跑到多少呢，谢谢！

nanguoshun commented 4 years ago

Hi @nttmac , thanks for your attention. We didn't conduct an ablation study for BERT+LSR. The F1 with the BERT(base) encoder on the test set of DocRED is 59.05, which was given in our paper.

nttmac commented 4 years ago

请问只用Bert，不用LSR在DocRED测试集上结果是多少呢？是基于Wang hong的代码实现的吗，那个结果好像偏低（我重构代码进行复现的结果有55+）？想知道在bert上LSR能带来多大的提升，谢谢！

nanguoshun commented 4 years ago

@nttmac 谢谢关注。BERT+LSR相比Glove+LSR在DEV和TEST上大概有4个点的提升，你可以参考下Table2

xwjim commented 4 years ago

请问只用Bert，不用LSR在DocRED测试集上结果是多少呢？是基于Wang hong的代码实现的吗，那个结果好像偏低（我重构代码进行复现的结果有55+）？想知道在bert上LSR能带来多大的提升，谢谢！

请问下你训练bert大概多少个epoch能到最优，我训练bert也能到55.2左右，但是训练比较久

nttmac commented 4 years ago

请问只用Bert，不用LSR在DocRED测试集上结果是多少呢？是基于Wang hong的代码实现的吗，那个结果好像偏低（我重构代码进行复现的结果有55+）？想知道在bert上LSR能带来多大的提升，谢谢！

请问下你训练bert大概多少个epoch能到最优，我训练bert也能到55.2左右，但是训练比较久

我设置的batch_size是30，跑了250个epoch能到最优，也是55点多，是比较久。

nanguoshun commented 4 years ago

@xwjim @nttmac BERT部分我参考的是WangHong的代码，我跑出来的结果也比他Paper中的结果高一些，大概55左右。由于训练BERT非常Tricky，所以我直接使用了Wang Hong在Paper中给出的结果。LSR+GLove大概需要50个Epoch（8个小时左右），LSR+BERT大概需要80个Epoch（36个小时左右）。后边建议用英文哈，整中文其他人看不懂。

nanguoshun commented 4 years ago

I close this issue in case they are no further queries

VinnyHu commented 3 years ago

@xwjim @nttmac BERT部分我参考的是WangHong的代码，我跑出来的结果也比他Paper中的结果高一些，大概55左右。由于训练BERT非常Tricky，所以我直接使用了Wang Hong在Paper中给出的结果。LSR+GLove大概需要50个Epoch（8个小时左右），LSR+BERT大概需要80个Epoch（36个小时左右）。后边建议用英文哈，整中文其他人看不懂。

When you use bert as encoder in LSR. how much is the batch_size? I use lr as 1e-5 and batch_size as 8 on two 2080Ti. I only have F1 55.7 and loss is 0.060.

nanguoshun commented 3 years ago

hi @VinnyHu , the lr is 1e-5 and the batch size is 20. We trained the model on 3 * 24GB GPUs. Empirically, the batch size for bert-based models should be always larger than 15.