mmatena / m251

0 stars 0 forks source link

Figure out why GLUE fine-tuning sometimes diverges. #6

Open mmatena opened 3 years ago

mmatena commented 3 years ago

Right now, it looks to happen on QQP. I'll look at the results soon to get a better idea of when it's happening.

I should also just Google the problem and BERT finetuning advice.

I also might want to change the epsilon hyperparameter on the Adam optimizer and make sure the rest look good.