tricktreat / locate-and-label

Code for "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021.
102 stars 18 forks source link

Concern about results. Did you pick models using the test set? #4

Closed LouChao98 closed 3 years ago

LouChao98 commented 3 years ago

Hi, thank you for your great job and code. I have a concern about the results. In configs/example.conf, all tasks are set to use "test" as the validation set. But in your code and README, the model is selected according to the performance on the validation set, which is "test" actually. For example, in Quick Start of README, you report

Best F1 score: 80.63560463237275, achieved at Epoch: 34

Because 34 is the last epoch, so I am not sure whether you use the test set to select models or just a coincidence. But In my reruns, the best model is not always the final model.

The correct setting should be one of the following:

  1. only use the loss of validation set to select models.
  2. fix the max number of training epoches and report the final model. The max number is tuned without the access to F1 score on validation set.

Could you explain how to get the results reported in your paper? Or just an error of your open sourced code?

tricktreat commented 3 years ago

Sorry for the untimely response. In order to test the model conveniently, we take the results of the last epoch of model training and do not use the development set or test set to select the model. In fact, in our experiments, the best results generally appear on the last epoch.

In Quick Start of README, we report

Best F1 score: 80.63560463237275, achieved at Epoch: 34

This log was prepared for setting 1 as you mentioned, but we didn't actually use the development set to select the model in our experiments. We adopted setting 2 to take the results of the final model.