keon / seq2seq

Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
MIT License
689 stars 172 forks source link

A problem with loss computation. #23

Open yxdr opened 4 years ago

yxdr commented 4 years ago

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)

The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.

fengxin619 commented 3 years ago

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)

The loss computed by the above line is the average at every time step, which can cause it difficult to train the model. So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.

so ,how to write the loss?