optimization of SCA - Githubissues

nicolabertoldi commented 5 years ago

After reading your paper, which is undoubtedly very much interesting, I gave a deep look into your code; I must admit that it is very well-organized. So thank you so much for your work.

Before starting my experimentation with it, I would like to know your suggestions about how to optimize the parameters of the system and of the training:

is there any better configuration of lm and lmnmt architectures with respect to the default?
for both lm and lmnmt training, which is the best configuration of the training parameters? (learning-rate, warmup-updates, etc)?

Should I pay particular attention to any aspect of the training to avoid bad performance?

teslacool commented 5 years ago

As for lmnmt arch, I adopt the default fairseq config, and I find this is a strong baseline, especially for small scale translation task, such as IWSLT. As for lm arch, I did not do so much work on it. So I adopt a similar arch like their corresponding lmnmt arch for them.
As for lmnmt training, I adopt the same fairseq config. And for lm training, sorry that i have not found a good example, So i adopt the same config as lmnmt training.

nicolabertoldi commented 5 years ago

@teslacool

Well, I will go for the default at the moment.

teslacool / SCA

optimization of SCA #3