no improvement with huge data

nicolabertoldi commented 5 years ago

I am using your software to create a large-sized system

My setting includes:

an encoder/decoder architecture similar to the transformer_big:
- encoder_embed_dim=decoder_embed_dim=1024
- encoder_ffn_embed_dim=decoder_ffn_embed_dim=4096
- encoder_input_dim=decoder_input_dim=1024
- encoder_output_dim=decoder_output_dim=1024
- encoder_layers=decoder_layers=6
- encoder_attention_heads=decoder_attention_heads=16
source and target language models:
- lmdecoder_embed_dim=1024
- lmdecoder_ffn_embed_dim=2048
- lmdecoder_input_dim=1024
- lmdecoder_output_dim=1024
- lmdecoder_layers=4
- lmdecoder_attention_heads=8

for a total of about 410M parameters.

The system was trained on a huge corpus having more than 1G words in each language.

Unfortunately and disappointingly, the performance of this system are slightly worse than the corresponding system without the LM having about 200M parameters.

I saw that you run your experiments showing a consistent improvement of 1 BLEU point on a smaller task (you train on only 4.5M sentence pairs, i.e. less than 100M words).

Did you run experiments on larger data sets?

What is your feeling about the use of LM on such big data set (more than 1G words)?

Do you think that I was wrong in some setting of my system?

Any comment or tip for improvement is welcome

teslacool commented 5 years ago

hi, sorry that i have not run experiments on such big dataset. However,

In my experiments, i found the convergence speed is slower than standard baseline. So i suggest you can wait for a long time(i.e., more update number maybe you need).
I think the effect of data augmentation methods will gradually decrease as the data slowly increase. But that the performance is worse than baseline is not reasonable...

nicolabertoldi commented 5 years ago

@teslacool

I hope that the bad performance I saw are related with the slowest convergence; I am continuing the training and cross the finger.

teslacool / SCA

no improvement with huge data #10