Open 520jefferson opened 5 years ago
after i generate data using get-data-nmt.sh ( --src de --tgt en), then i finetuen with
python3 train.py
--exp_name unsupMT_ende
--data_path ./data/processed/de-en
--lgs en-de
--bt_steps en-de-en,de-en-de
--encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout 0.1 --attention_dropout 0.1 --gelu_activation true --tokens_per_batch 2000 --batch_size 32 --bptt 256 --optimizer adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001 --epoch_size 200000 --max_epoch 30 --eval_bleu true
--reload_model ./model_ende/mass_ende_1024.pth,./model_ende/mass_ende_1024.pth
but i met this:
why we need "please ensure SRC < TGT", i used the update model and --src de --tgt en, then i can finetune en de . but i still cannot translate directly with masss_ft_ende model, maybe i should save all params myself.
File "translate.py", line 160, in
i use de-en to generate data and en-de to train using the newly updated en-de model.
but translate.py still can't translate using en-de model. maybe i should reserve all the parameter.
@520jefferson You means you can not translate it in any directions or just en->de?
I upload an ende model under this link by fixed params. Can you have a try? Besides, you can also load the pre-trained or fine-tuned weight at the training and then evaluate. (if the training step = 1, it is almost equal to the fine-tuned weight)
@StillKeepTry the third update of the model ? i will try to download the new model .
@520jefferson Previous model can be directly used to fine-tune. The above uploaded model keep the same weight of previous model but just add some params to support translation
@StillKeepTry same problem https://github.com/microsoft/MASS/issues/49, i try with new model and codes, but i get the same error.
i want to finetune en-de model , then i use blew to genearte data. ./get-data-nmt.sh --src en --tgt de --reload_codes model_ende/c odes_ende --reload_vocab model_ende/vocab_ende
but i met a error in this line : if [ "$SRC" > "$TGT" ]; then echo "please ensure SRC < TGT"; exit; fi
then is use --src de --tgt en, then i can run successfully. then i finetune on mass_ende_1024.pth, will the direction will affect my result? why the script limit $SRC > $TGT ?
on the other hand, the codes and vocab will be generated in /data/process/de-en, but i already set --reload_codes --reload_vocab. so weird!