Closed CiaoHe closed 3 months ago
Hi, we only focus the f1 max score for annotation tasks. During training, we didn't set early stop strategy but only saved model that performed best on validation set, e.g. model that had highest f1 max score for annotation tasks.
The mismatch between valid_f1_max
and valid_loss
seems to be inevitable since we also found the same phenomenon during fine-tuning. Therefore, we only concerned about the evaluation metric and ignored this weird change of loss. As soon as the valid_f1_max
increases, you could keep fine-tuning your model until it converges.
Yeah I agree. For the other annotation task like GO-CC, we found the valid_f1_max
is better than valid_loss
for tracking best performance. I think it should relate to the distribution of different classes. Thx for your detailed illustration.
I noticed during the finetuning process of the downstream task: overfitting phenomenon occurs when the finetuning epoch is set to 100. For example, when fine-tuning with ESM-2 8M model on GO/BP annotation task, the
valid-f1-max
keeps rising, but the valid loss overfitting.Test on the lowest valid loss ckpt (like 50 epochs), the test-f1-max = 0.2259 Test on the highest valid-f1-max ckpt (like 100 epochs), the test-f1-max = 0.2092
Therefore, I am curious about downstream finetuning. During training, will you do an early stop based on the overfitting phenomenon?