Closed LouChao98 closed 3 years ago
Sorry for the untimely response. In order to test the model conveniently, we take the results of the last epoch of model training and do not use the development set or test set to select the model. In fact, in our experiments, the best results generally appear on the last epoch.
In Quick Start of README, we report
Best F1 score: 80.63560463237275, achieved at Epoch: 34
This log was prepared for setting 1 as you mentioned, but we didn't actually use the development set to select the model in our experiments. We adopted setting 2 to take the results of the final model.
Hi, thank you for your great job and code. I have a concern about the results. In configs/example.conf, all tasks are set to use "test" as the validation set. But in your code and README, the model is selected according to the performance on the validation set, which is "test" actually. For example, in Quick Start of README, you report
Because 34 is the last epoch, so I am not sure whether you use the test set to select models or just a coincidence. But In my reruns, the best model is not always the final model.
The correct setting should be one of the following:
Could you explain how to get the results reported in your paper? Or just an error of your open sourced code?