Hi, I found the loss function in the def 'train' and 'evaluate' is cross-entropy. But in the model.py, the output from the decoder is operated by a log-softmax function. According to the definition of cross_entropy, the log_softmax operation has been included in the cross_entropy. I think the loss function in 'train' and 'evaluate' might be nll-loss. Then the entire loss function containing final operation of the decoder and nll_loss is the cross_entropy.
Hi, I found the loss function in the def 'train' and 'evaluate' is cross-entropy. But in the model.py, the output from the decoder is operated by a log-softmax function. According to the definition of cross_entropy, the log_softmax operation has been included in the cross_entropy. I think the loss function in 'train' and 'evaluate' might be nll-loss. Then the entire loss function containing final operation of the decoder and nll_loss is the cross_entropy.