yuvalkirstain / s2e-coref

MIT License
45 stars 15 forks source link

Vanishing gradients? #11

Open Twim17 opened 9 months ago

Twim17 commented 9 months ago

Hi, I was testing this model and during training i noticed that really quickly the training loss goes to zero and then it becomes unstable (staying at zero for most of the time, jumping to higher values the rest of the times). So I investigated a little with wandb to look at the gradients and it seems to me that there could be vanishing gradients. So my question is, did you actually saw if you had vanishing gradients? Did you also had such unstable loss (at least in the first epochs)?

Twim17 commented 9 months ago

Anybody?