Has anybody pre-trained successfully on en-de translation with MASS ?

ZhenYangIACAS commented 5 years ago

I have tried a lot to reproduce the results on En-de, but I failed.

StillKeepTry commented 5 years ago

I have fixed some params in the uploaded model, Do you have a try?

ZhenYangIACAS commented 5 years ago

I did not mean to reload the uploaded model. I want to reproduce the results from scratch (from pre-training to finetune). But I failed on pre-training. Anyway, which params have you fixed?

yuekai146 commented 4 years ago

I also failed in pretraining from scratch. Here is my training script.

    export NGPU=8
    python3 -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
            --exp_name unsupMT_deen \
            --data_path /data/corpus/news_crawl/de-en/ \
            --lgs 'de-en' \
            --mass_steps 'de,en' \
            --encoder_only false \
            --emb_dim 512 \
            --n_layers 6 \
            --n_heads 8 \
            --dropout 0.1 \
            --attention_dropout 0.1 \
            --gelu_activation true \
            --tokens_per_batch 3000 \
            --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
            --epoch_size 200000 \
            --max_epoch 100 \
            --eval_bleu true \
            --word_mass 0.5 \
            --min_len 5

I use 170,000,000 German sentences and 100,000,000 English sentences, due to memory issues, I use transformer-base instead of transformer-big. Here is my fine tune script.

    export NGPU=8
    ckpt_path="../base/dumped/unsupMT_deen/9g0eku48dy"
    python3 -m torch.distributed.launch --nproc_per_node=$NGPU train.py \
            --exp_name unsupMT_deen_ft \
            --data_path /data/corpus/news_crawl/de-en/ \
            --lgs 'de-en' \
            --bt_steps 'de-en-de,en-de-en' \
            --encoder_only false \
            --emb_dim 512 \
            --n_layers 6 \
            --n_heads 8 \
            --dropout 0.1 \
            --attention_dropout 0.1 \
            --gelu_activation true \
            --tokens_per_batch 2000 \
            --batch_size 32 \
            --bptt 256 \
            --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 \
            --epoch_size 200000 \
            --max_epoch 30 \
            --save_periodic 1 \
            --eval_bleu true \
            --reload_model "$ckpt_path/checkpoint.pth,$ckpt_path/checkpoint.pth"

I could only get 22.86 BLEU points in translating German to English on newstest2016, which is far from what is reported in the paper.

Could you give me some advices on pretraining from scratch and how to fully reproduce your results?