zomux / lanmt-ebm

lanmt ebm
11 stars 1 forks source link

Better base latent model #17

Open zomux opened 4 years ago

zomux commented 4 years ago

Todo List

Checklist

jaseleephd commented 4 years ago

Previous WMT'14 En->De results without refinement (mean of the prior): 22.5 BLEU ("strong" in Shu et al.) 23.15 BLEU ("Gauss-base" in Lee et al., with latent_dim=256)

zomux commented 4 years ago

Train with hidden size = 512

run_8nodes abcirun.sh python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --train

[valid] len_loss=1.90 len_acc=0.28 loss=30.99 word_acc=0.95 KL_budget=0.76 kl=22.77 tok_kl=0.79 nll=6.33 * (epoch 113, step 93158)

BLEU = 21.2024005116522

zomux commented 4 years ago

Training with fastanneal

run_2nodes abcirun.sh python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --train --test --evaluate

jaseleephd commented 4 years ago

Let's keep using the distilled dataset from fairseq (as the "strong" model got 25.3 BLEU with 1 refinement), so it should be pretty good.

jaseleephd commented 4 years ago

Jason's Gauss VAE models : https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/research/transformer_vae_flow_prior.py#L655-L682

zomux commented 4 years ago

Checklist

zomux commented 4 years ago

distilled dataset, ignore longer than 64 tokens

zomux commented 4 years ago

Latent dim = 512

run_2nodes python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --train --test --evaluate

zomux commented 4 years ago

More prior, q, and decoding layers (prior 4, q 4, decoder 6 layers)

run_2nodes python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --opt_priorl 4 --opt_decoderl 6 --train --test --evaluate

jaseleephd commented 4 years ago

Also noticed the default num_heads is 4 : https://github.com/zomux/lanmt-ebm/blob/master/run_ebm.py#L82

@zomux what value are you using for WMT experiments?

zomux commented 4 years ago

num_heads=8

run_2nodes python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --opt_priorl 4 --opt_decoderl 6 --opt_heads 8 --train --test --evaluate

zomux commented 4 years ago

Training for 500k steps

abcirun.sh python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_x5longert rain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --opt_priorl 4 --opt_decoderl 6 --train --test --evaluate

After 200k steps

zomux commented 4 years ago

num_heads=8, layers = 6/6/6

run_2nodes python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --opt_priorl 6 --opt_decoderl 6 --opt_heads 8 --train --test --evaluate

zomux commented 4 years ago

num_heads=8, layers =6/6/6, 500 steps

./run_2nodes_long.sh abcirun.sh python lanmt/run2.py --root $HOME/data/wmt14_ende_fair --opt_dtok wmt14_fair_ende --opt_batchtokens 8192 --opt_distill --opt_annealbudget --opt_x5longertrain --opt_hiddensz 512 --opt_embedsz 512 --opt_fastanneal --opt_latentdim 512 --opt_priorl 6 --opt_decoderl 6 --opt_heads 8 --train --test --evaluate

At 300k steps: