Open soodeh-nilforoushan opened 2 years ago
The dev_f1
uses the validation dataset for evaluation, while f1
uses the test dataset. Lastly, best_f1
indicates the best f1 score evaluated against the test dataset.
Eventually, the model that is written to disk is simply the last checkpoint. Since you should report the f1 of this model, it's best to use f1
.
When I run the code I got three f1 from different epochs. Which f1 should we report as a final f1 accuracy based on the paper? this is the example of out put: epoch 5: dev_f1=0.8317046688382194, f1=0.818146568437379, best_f1=0.8185719859539602