shon-otmazgin / fastcoref

MIT License
142 stars 25 forks source link

Unusual learning curve #45

Open psuwannapich opened 11 months ago

psuwannapich commented 11 months ago

Hello, I try to finetune your model in another language but the loss of model both train and val set are 0 for first few epoches.

The loss curve

Screenshot 2566-10-01 at 05 11 04

Is this normal or I do anything wrong?

dimitristaufer commented 1 month ago

Hello, I am having the same issue. I am trying to finetune it using a German language dataset and it is giving me almost exactly the same picture that you got.

Were you able to find a solution? (or explanation)

Thank you.

psuwannapich commented 1 month ago

I think this come from the model loss function from e2e-coref

  1. The model starts with predicting random result. So there is the peak at step 0.
  2. Due to the loss function that try to only maximize the likelihood of positive prediction but ignore true negative case, the model can just predict "there is no mention in the document" and loss is remain at 0.
  3. With some luck (I guess), the model firstly identifies some mentions and it's wrong. The loss is starting to increase.
  4. After that the model can be train properly. The loss decrease until it's converge.

If you get the loss curve like I did, it's fine since the model already converged. (According to my curve, it start to converge after step 7,000) But if the loss remain 0 after the training is conclude, I suggest to change your hyperparameters (or may be using new seed) and train your model again several times.

By the way, the best way to fix these is to fix the loss function to cover true negative case.