jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.78k stars 1.97k forks source link

Surprising PPL on WMT 17 #154

Open luffycodes opened 4 years ago

luffycodes commented 4 years ago

Running the code with n_head set to 1 leads to PPL of 6.65 (other parameters are same as that in readme). The resulting log is attached below. I'm surprised by such low PPL because n_head set to default results in PPL of 11. Is this behaviour as expected?

"[ Epoch 356 ]