microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

Fail to Reproduce the Result of UnsupMT-EnDe #141

Closed LibertFan closed 4 years ago

LibertFan commented 4 years ago

I finetune the mass_ende_1024.pth you provide in Unsupvised-MASS with the following command:

python -m torch.distributed.launch --nproc_per_node=$NGPU train2.py \
  --dump_path ${MAIN_DIR}/log/ \
  --save_periodic 10 \
  --exp_name unsupMT_ende_finetune  \
  --exp_id ${ARCH}_v${VERSION}    \
  --data_path  ${DATA_PATH}  \
  --lgs 'en-de'                                        \
  --bt_steps 'en-de-en,de-en-de'                       \
  --encoder_only false                                 \
  --emb_dim 1024                                       \
  --n_layers 6                                         \
  --n_heads 8                                          \
  --dropout 0.1                                        \
  --attention_dropout 0.1                              \
  --gelu_activation true                               \
  --tokens_per_batch 2000                            \
  --batch_size 32                                        \
  --bptt 256                                           \
  --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
  --epoch_size 200000                                  \
  --max_epoch 50                                       \
  --eval_bleu true                                     \
  --reload_model "$MODEL,$MODEL" \
  --word_mass 0.5                                      \
  --min_len 5                                         \
  --lambda_span "8"                                    \
  --word_mask_keep_rand '0.8,0.1,0.1' 

The result is:

INFO - 05/11/20 05:14:41 - 14:07:52 - valid_de-en_mt_ppl -> 26.467271
INFO - 05/11/20 05:14:41 - 14:07:52 - valid_de-en_mt_acc -> 57.486569
INFO - 05/11/20 05:14:41 - 14:07:52 - valid_de-en_mt_bleu -> 24.670000
INFO - 05/11/20 05:14:41 - 14:07:52 - valid_en-de_mt_ppl -> 26.211361
INFO - 05/11/20 05:14:41 - 14:07:52 - valid_en-de_mt_acc -> 57.564177
INFO - 05/11/20 05:14:41 - 14:07:52 - valid_en-de_mt_bleu -> 21.000000
INFO - 05/11/20 05:14:41 - 14:07:52 - test_de-en_mt_ppl -> 12.171424
INFO - 05/11/20 05:14:41 - 14:07:52 - test_de-en_mt_acc -> 65.458238
INFO - 05/11/20 05:14:41 - 14:07:52 - test_de-en_mt_bleu -> 32.700000
INFO - 05/11/20 05:14:41 - 14:07:52 - test_en-de_mt_ppl -> 12.218844
INFO - 05/11/20 05:14:41 - 14:07:52 - test_en-de_mt_acc -> 64.317774
INFO - 05/11/20 05:14:41 - 14:07:52 - test_en-de_mt_bleu -> 26.470000

test_de-en_mt_bleu is about 33.0, which is far behind the 35.2 in your paper.

Could you provide any suggestions to improve my finetuning?

StillKeepTry commented 4 years ago

You can try more data, Here is one of our used data with 50M monolingual data: data. Besides, beam search and length penalty are also important for the final result.

LibertFan commented 4 years ago

Coulld you provide some details about the hyperparameters, beam search and length penalty, etc.? @StillKeepTry That would be very helpful!

zwhe99 commented 3 years ago

I use all data from newcrawl 07-17 for reproducing unsupMT-EnDe. I get bleus en-de:33.3, de-en:26.5, which are still lower than reported results.