successar / Eraser-Benchmark-Baseline-Models

Baseline for ERASER benchmark
https://www.eraserbenchmark.com
17 stars 8 forks source link

The results of BERT-LSTM model is different from the paper #1

Open aiishii opened 4 years ago

aiishii commented 4 years ago

Hello. Let me ask you a question.

I tried to build a BERT-LSTM model for your paper using Movie Reviews data, but I couldn't reproduce the paper results. My results are as follows. Training: train 0.925, validation 0.833, test 0.849 prediction: Performance AURPRC comprehensiveness sufficiency BERT-LSTM + Attention 0.829 0.463 0.223 0.141 BERT-LSTM + Simple Gradient 0.829 0.469 0.222 0.141 The performance in Table 4 of the paper is 0.974, and my result is 0.829, which is very different.

What I changed from the parameters listed in the README is that the predict batch size is 4 to 2 due to lack of memory. My environment is as follows: Memory 65G, GPU NVIDIA Tesla 32GB

Could you tell me if there are any parameter differences or any other differences from the paper experiments?

xmshi-trio commented 3 years ago

Hi, I tried to build bert_encoder_generator using Movie Reviews data, but I met with some issues. The training processes are normal, but the results on the validation data are always the same with fscore_NEG: 0.000 fscore_POS: 0.667. I try different bert learning rate with 5e-1, 5e-2, 5e-3, 5e-4, 5e-5. However, the results on the validation dataset are the same. Could you show me how to set parameters?

successar commented 3 years ago

Hi, the Bert encoder generator model is extremely unstable hence it is not surprising that you are getting bad results. Could you try with word_emb_encoder_generator model ? Also try setting reinforce_loss_weight to 0 here https://github.com/successar/Eraser-Benchmark-Baseline-Models/blob/894bfba09e8966aec9b046ddc595d434504a4f90/Rationale_model/training_config/classifiers/bert_encoder_generator.jsonnet#L99 and see if you still get the same problem.