Surprising PPL on WMT 17 - Githubissues

jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

MIT License

8.78k stars 1.97k forks source link

Surprising PPL on WMT 17 #154

Open luffycodes opened 4 years ago

luffycodes commented 4 years ago

Running the code with n_head set to 1 leads to PPL of 6.65 (other parameters are same as that in readme). The resulting log is attached below. I'm surprised by such low PPL because n_head set to default results in PPL of 11. Is this behaviour as expected?

"[ Epoch 356 ]

(Training) ppl: 11.29374, accuracy: 74.314 %, elapse: 0.540 min
(Validation) ppl: 6.65451, accuracy: 67.306 %, elapse: 0.006 min
- [Info] The checkpoint file has been updated."