Concern about results. Did you pick models using the test set?

tricktreat / locate-and-label

Code for "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021.

102 stars 18 forks source link

Hi, thank you for your great job and code. I have a concern about the results. In configs/example.conf, all tasks are set to use "test" as the validation set. But in your code and README, the model is selected according to the performance on the validation set, which is "test" actually. For example, in Quick Start of README, you report

Best F1 score: 80.63560463237275, achieved at Epoch: 34

Because 34 is the last epoch, so I am not sure whether you use the test set to select models or just a coincidence. But In my reruns, the best model is not always the final model.

The correct setting should be one of the following:

only use the loss of validation set to select models.
fix the max number of training epoches and report the final model. The max number is tuned without the access to F1 score on validation set.

Could you explain how to get the results reported in your paper? Or just an error of your open sourced code?

tricktreat / locate-and-label

Concern about results. Did you pick models using the test set? #4