microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

Question of pretraining text-generation task, it seems that pretraining is not work for a small model? #161

Closed guotong1988 closed 4 years ago

guotong1988 commented 4 years ago

What is your question?

My task is to generate keywords from sentences.

I pretrain a text-generation model. I mask the sentences' tokens and predict the whole sentences' tokens.

Pretraining batch_size = 8 and step = 1000000

I haven't observed improvement from pretraining. BLEU score is 10.5 for not pretraining, BLEU score is 9.5 for pretraining.

Code

I take the python code from

https://github.com/google-research/pegasus/blob/master/pegasus/models/transformer.py#L38

hidden_size = 512 num_encoder_layers = 3 num_decoder_layers = 3

Discussion

The task is to generate keyword from sentences. The keyword may not appear in the sentences. So input masked sentences to predict whole sentences, it is not benefit the keywords generation task. Input masked sentences to predict whole sentences, it do not have relation to the keywords generation task. Am I right? Is it the reason that pretraining do not improve the BLEU score?

Thank you very much.

tan-xu commented 4 years ago

You may consider to input masked sentences and predict masked tokens, instead of the whole sentence.

guotong1988 commented 4 years ago

You mean: Generate masked tokens?

Thank you!

guotong1988 commented 4 years ago

1, I pad some zeros in the input tokens for multi sentences. The output positions of output tokens should be exactly same to the input tokens, which means I should keep the padding zeros in the output tokens.

2, The pretraining time should be longer.