Convolution-based energy model (delta inference approximation)

zomux commented 4 years ago

Implement E(x, z) with convolution but not self-attention

zomux commented 4 years ago

lib_score_matching3.py

self._encoder = ConvolutionalEncoder(None, self._hidden_size, 3)

zomux commented 4 years ago

https://github.com/zomux/lanmt-ebm/blob/fdf0c0614c3f517c1a1fc70807a3ea6adaebcfdb/lib_simple_encoders.py#L104-L134

zomux commented 4 years ago

Training command

mpirun -np 8 -H localhost -oversubscribe -bind-to none -map-by slot -x LD_LIBRARY_PATH -x PATH python lanmt/run.py  --root $HOME/data/wmt14_ende_fair --opt_fixbug1 --opt_dtok wmt14_fair_ende --opt_batchtokens 4096 --opt_distill --opt_annealbudget --opt_klbudget 10.0 --opt_beginanneal 20000 --opt_fastanneal --opt_x3longertrain --opt_zeroprior --opt_scorenet --train

zomux commented 4 years ago

First training epoch

[valid] loss=803.41 * (epoch 1, step 1)
[valid] loss=798.89 * (epoch 1, step 442)/s
[valid] loss=778.56 * (epoch 1, step 883)n/s
[valid] loss=636.32 * (epoch 1, step 1324)/s
[valid] loss=378.91 * (epoch 1, step 1765)/s
[valid] loss=273.67 * (epoch 1, step 2206)/s
[valid] loss=213.99 * (epoch 1, step 2647)/s
[valid] loss=184.32 * (epoch 1, step 3088)/s
[valid] loss=165.59 * (epoch 1, step 3529)/s
[valid] loss=156.33 * (epoch 1, step 3970)/s

zomux commented 4 years ago

Testing log, getting away from refined target

[OPTS] Model tag: annealbudget_beginanneal-20000_distill_dtok-wmt14_fair_ende_fastanneal_fixbug1_klbudget-10.0_scorenet
_x3longertrain_zeroprior
Running on 1 GPUs
0.30023086 -146570.53 131.94466
0.33306664 -146586.55 110.7831
0.39454266 -146597.83 93.16517
0.46083358 -146605.83 78.54995
0.5231356 -146611.53 66.62458
0.5788895 -146615.64 57.204826
0.6284725 -146618.66 48.380745
0.6712859 -146620.83 41.08072
0.7081252 -146622.39 35.03953
0.7397679 -146623.53 29.7933
0.7668065 -146624.34 25.418055
0.78993136 -146624.94 21.673178
0.80966747 -146625.38 18.504524
0.82651055 -146625.69 15.819081
0.84088683 -146625.92 13.539381
0.85316104 -146626.11 11.601258
0.86364406 -146626.22 10.029672
0.87266016 -146626.3 8.612568
0.88036776 -146626.39 6.984665
0.8864674 -146626.42 6.0106077
0.89168507 -146626.47 5.1776733
0.896151 -146626.48 4.464289
0.8999751 -146626.52 3.852672
0.9032514 -146626.52 3.3277662
0.90605944 -146626.53 2.8768384
0.90846735 -146626.53 2.4890373
0.91053325 -146626.55 2.155253
0.91230637 -146626.55 1.867643
0.9138291 -146626.55 1.6196085
0.9151373 -146626.56 1.4054807
0.9162616 -146626.55 1.2205523
0.91722834 -146626.55 1.0607029
0.91806 -146626.56 0.92239416
0.9187757 -146626.56 0.8026079
0.91939193 -146626.55 0.69882125
0.9199226 -146626.56 0.6088599
0.9203799 -146626.55 0.53073484
0.9207741 -146626.55 0.46296415
0.921114 -146626.55 0.40413654
0.9214072 -146626.56 0.35284844
0.9216603 -146626.56 0.30829704
0.92187876 -146626.56 0.26955092
0.9220675 -146626.55 0.23569745
0.92223054 -146626.55 0.20618062
0.92237157 -146626.55 0.18050474
0.9224934 -146626.55 0.15812565
0.9225988 -146626.55 0.13850011
0.92269003 -146626.55 0.12143548
0.9227689 -146626.55 0.106529
0.92283726 -146626.55 0.0934125

zomux commented 4 years ago

BLEU = 22.14009583693211

zomux commented 4 years ago

ebm naive training wont solve the structured prediction

zomux / lanmt-ebm

Convolution-based energy model (delta inference approximation) #1