shon-otmazgin / fastcoref

MIT License
149 stars 26 forks source link

Unusual learning curve #45

Open psuwannapich opened 1 year ago

psuwannapich commented 1 year ago

Hello, I try to finetune your model in another language but the loss of model both train and val set are 0 for first few epoches.

The loss curve

Screenshot 2566-10-01 at 05 11 04

Is this normal or I do anything wrong?

dimitristaufer commented 4 months ago

Hello, I am having the same issue. I am trying to finetune it using a German language dataset and it is giving me almost exactly the same picture that you got.

Were you able to find a solution? (or explanation)

Thank you.

psuwannapich commented 4 months ago

I think this come from the model loss function from e2e-coref

  1. The model starts with predicting random result. So there is the peak at step 0.
  2. Due to the loss function that try to only maximize the likelihood of positive prediction but ignore true negative case, the model can just predict "there is no mention in the document" and loss is remain at 0.
  3. With some luck (I guess), the model firstly identifies some mentions and it's wrong. The loss is starting to increase.
  4. After that the model can be train properly. The loss decrease until it's converge.

If you get the loss curve like I did, it's fine since the model already converged. (According to my curve, it start to converge after step 7,000) But if the loss remain 0 after the training is conclude, I suggest to change your hyperparameters (or may be using new seed) and train your model again several times.

By the way, the best way to fix these is to fix the loss function to cover true negative case.