megagonlabs / ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"
Apache License 2.0
262 stars 89 forks source link

Which f1 should we report? #25

Open soodeh-nilforoushan opened 2 years ago

soodeh-nilforoushan commented 2 years ago

When I run the code I got three f1 from different epochs. Which f1 should we report as a final f1 accuracy based on the paper? this is the example of out put: epoch 5: dev_f1=0.8317046688382194, f1=0.818146568437379, best_f1=0.8185719859539602

rinkstiekema commented 1 year ago

The dev_f1 uses the validation dataset for evaluation, while f1 uses the test dataset. Lastly, best_f1 indicates the best f1 score evaluated against the test dataset.

Eventually, the model that is written to disk is simply the last checkpoint. Since you should report the f1 of this model, it's best to use f1.