jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.78k stars 1.97k forks source link

What I get from the default is very different from what you showed. Is it because of the code update? #132

Closed SmallSmallQiu closed 4 years ago

SmallSmallQiu commented 4 years ago

image image 1 jpg

SmallSmallQiu commented 4 years ago

I didn't get a good model

jadore801120 commented 4 years ago

i am sorry, I do not mean that the epoch setting is the best. The model is not yet converged after 10 epoch. You could try with 300 epochs.

SmallSmallQiu commented 4 years ago

i am sorry, I do not mean that the epoch setting is the best. The model is not yet converged after 10 epoch. You could try with 300 epochs.

image I have tried to increase the value of echop to 100, but the system reported CUDA error, your echop parameter seems to affect the CUDA occupation size, my GPU is 1080

jadore801120 commented 4 years ago

In this case, you have to reduce the size of your batch.

SmallSmallQiu commented 4 years ago

在这种情况下,您必须减小批次的大小。

The effect of the model has been greatly improved. Thank you for your help

jadore801120 commented 4 years ago

No problem, feel free to open another issue if there is anything wrong. Best, Yu-Hsiang