westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
271 stars 25 forks source link

Downstream Annotation task 'EC/GO' finetuning overfitting #19

Closed CiaoHe closed 3 months ago

CiaoHe commented 3 months ago

I noticed during the finetuning process of the downstream task: overfitting phenomenon occurs when the finetuning epoch is set to 100. For example, when fine-tuning with ESM-2 8M model on GO/BP annotation task, the valid-f1-max keeps rising, but the valid loss overfitting. image

Test on the lowest valid loss ckpt (like 50 epochs), the test-f1-max = 0.2259 Test on the highest valid-f1-max ckpt (like 100 epochs), the test-f1-max = 0.2092

Therefore, I am curious about downstream finetuning. During training, will you do an early stop based on the overfitting phenomenon?

CiaoHe commented 3 months ago

Related line: https://github.com/westlake-repl/SaProt/blob/0a8a6fa276eed1ebd7c2b7f71bfd4d88f9b093eb/model/esm/esm_annotation_model.py#L89

LTEnjoy commented 3 months ago

Hi, we only focus the f1 max score for annotation tasks. During training, we didn't set early stop strategy but only saved model that performed best on validation set, e.g. model that had highest f1 max score for annotation tasks.

The mismatch between valid_f1_max and valid_loss seems to be inevitable since we also found the same phenomenon during fine-tuning. Therefore, we only concerned about the evaluation metric and ignored this weird change of loss. As soon as the valid_f1_max increases, you could keep fine-tuning your model until it converges.

CiaoHe commented 3 months ago

Yeah I agree. For the other annotation task like GO-CC, we found the valid_f1_max is better than valid_loss for tracking best performance. I think it should relate to the distribution of different classes. Thx for your detailed illustration.