Open yxdr opened 4 years ago
loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)
The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.
so ,how to write the loss?
loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)
The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.