Closed nicolabertoldi closed 5 years ago
hi, sorry that i have not run experiments on such big dataset. However,
@teslacool
I hope that the bad performance I saw are related with the slowest convergence; I am continuing the training and cross the finger.
I am using your software to create a large-sized system
My setting includes:
for a total of about 410M parameters.
The system was trained on a huge corpus having more than 1G words in each language.
Unfortunately and disappointingly, the performance of this system are slightly worse than the corresponding system without the LM having about 200M parameters.
I saw that you run your experiments showing a consistent improvement of 1 BLEU point on a smaller task (you train on only 4.5M sentence pairs, i.e. less than 100M words).
Did you run experiments on larger data sets?
What is your feeling about the use of LM on such big data set (more than 1G words)?
Do you think that I was wrong in some setting of my system?
Any comment or tip for improvement is welcome