Open ybisk opened 6 years ago
Elman (sigmoid) Parameter Sweep
hidden = 1024
dropout = 0.0
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log 50.130 133.690
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log 57.430 140.930
elman_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log 57.510 141.220
elman_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log 59.860 143.500
elman_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log 63.590 144.990
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log 2.480 145.830
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log 55.160 148.420
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log 43.470 150.940
elman_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log 66.780 155.810
elman_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log 63.050 156.590
elman_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log 59.040 158.820
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log 64.950 158.860
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log 1.730 161.330
elman_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log 91.840 161.700
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log 34.010 161.930
elman_word_l35_h1024_e512_lr0.001_drop0.0_adam.log 1.960 165.560
elman_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log 94.360 166.070
elman_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log 40.790 166.680
elman_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log 96.020 168.460
elman_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log 96.310 168.630
elman_word_l35_h1024_e256_lr0.001_drop0.0_adam.log 2.510 170.890
elman_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log 48.130 174.140
elman_word_l35_h1024_e128_lr0.001_drop0.0_adam.log 3.470 179.980
elman_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log 55.870 183.220
Just because one can never have enough optimization methods, I added "RAMSProp", which is the "fixed" version of Adam, together with beta_1=0, which makes it similar to RMSprop and which has been shown to be better for LMing. Together with LR decay (which is by default not used for Adam), this is more stable and doesn't overfit like Adam.
Another fun fact is that some published models have a parameter budget of 10M, which corresponds to a 1-layer LSTM with tied embeddings and hidden size 650, or Elman etc with hidden size 850. This seems to be roughly the minimum model capacity required to be able to overfit (and get competitive performance with regularization).
Sorry, so should we run with that instead of 1024? Here's a buttload of numbers
jordan_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log 77.250 150.190
jordan_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log 71.750 152.530
jordan_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log 65.340 152.560
jordan_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log 69.440 153.230
jordan_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log 75.510 153.270
jordan_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log 78.620 153.520
jordan_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log 72.740 153.990
jordan_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log 69.100 154.300
jordan_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log 78.400 154.650
jordan_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log 77.210 156.930
jordan_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log 79.030 156.930
jordan_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log 83.850 158.420
jordan_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log 84.210 159.380
jordan_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log 84.630 160.190
jordan_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log 846.360 835.070
jordan_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log 876.200 865.250
jordan_word_l35_h1024_e512_lr0.001_drop0.0_adam.log 881.430 866.270
jordan_word_l35_h1024_e256_lr0.001_drop0.0_adam.log 893.580 879.730
jordan_word_l35_h1024_e128_lr0.001_drop0.0_adam.log 896.610 882.810
jordan_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log 6108.980 6077.580
jordan_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log 6625.260 6593.210
jordan_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log 6631.050 6601.740
jordan_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log 6782.500 6747.610
jordan_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log 6781.270 6754.370
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log 49.700 116.760
elman_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log 58.140 123.030
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log 60.610 123.070
elman_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log 56.790 123.680
elman_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log 62.400 125.840
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log 61.960 128.110
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log 65.910 134.800
elman_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log 66.970 136.010
elman_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log 60.850 136.350
elman_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log 66.900 136.720
elman_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log 94.880 144.160
elman_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log 96.840 146.340
elman_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log 96.580 146.840
elman_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log 153.680 184.130
elman_word_l35_h1024_e256_lr0.001_drop0.0_adam.log 826.450 802.270
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log 826.680 804.160
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log 826.510 804.180
elman_word_l35_h1024_e128_lr0.001_drop0.0_adam.log 830.860 812.240
elman_word_l35_h1024_e512_lr0.001_drop0.0_adam.log 833.050 813.240
elman_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log 1993.090 1963.810
elman_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log 1964.530 1973.200
elman_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log 2023.590 2026.800
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log 2066.940 2052.280
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log 2068.670 2053.380
rnn-1_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log 262.140 296.410
rnn-1_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log 278.280 309.290
rnn-1_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log 321.020 342.120
rnn-1_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log 336.080 353.360
rnn-1_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log 351.160 371.310
rnn-1_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log 367.230 380.750
rnn-1_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log 376.830 394.500
rnn-1_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log 384.170 397.680
rnn-1_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log 383.760 400.280
rnn-1_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log 407.690 422.080
rnn-1_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log 537.050 531.540
rnn-1_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log 643.040 632.830
rnn-1_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log 686.860 678.760
rnn-1_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log 687.050 678.980
rnn-1_word_l35_h1024_e512_lr0.001_drop0.0_adam.log 4537.210 4503.220
rnn-1_word_l35_h1024_e256_lr0.001_drop0.0_adam.log 4538.510 4504.380
rnn-1_word_l35_h1024_e128_lr0.001_drop0.0_adam.log 4538.560 4504.560
rnn-1_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log 4538.730 4504.660
rnn-1_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log 4538.750 4504.670
rnn-1_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log 9222.730 9215.350
rnn-1_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log 9225.590 9217.940
rnn-1_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log 9225.760 9218.380
rnn-1_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log 9226.020 9218.500
rnn-1_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log 9226.050 9218.530
rnn-2_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log 107.220 186.360
rnn-2_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log 111.650 186.970
rnn-2_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log 108.160 187.510
rnn-2_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log 114.970 187.680
rnn-2_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log 105.210 187.730
rnn-2_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log 108.930 187.840
rnn-2_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log 111.310 188.080
rnn-2_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log 100.990 188.640
rnn-2_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log 113.470 189.410
rnn-2_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log 93.210 191.760
rnn-2_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log 109.500 192.180
rnn-2_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log 96.040 194.690
rnn-2_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log 88.460 195.520
rnn-2_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log 86.500 200.300
rnn-2_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log 854.610 843.080
rnn-2_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log 876.520 864.850
rnn-2_word_l35_h1024_e256_lr0.001_drop0.0_adam.log 882.110 866.610
rnn-2_word_l35_h1024_e512_lr0.001_drop0.0_adam.log 887.910 872.900
rnn-2_word_l35_h1024_e128_lr0.001_drop0.0_adam.log 893.210 879.030
rnn-2_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log 6344.320 6307.890
rnn-2_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log 6482.970 6443.870
rnn-2_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log 6781.020 6749.900
rnn-2_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log 6831.720 6795.480
rnn-2_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log 6856.410 6823.910
I think I need to shrink the hidden dim for the HMM runs. Maybe I should have started with those, but it seems like --batch-size 5 --hidden-dim 512
seems to be running ok where previously I used --batch-size 20 --hidden-dim 1024
these are mostly tied embeddings but I had one long run in there beforehand.
HMM
Train Valid Parameters
h512 e128 lr20.0 sgd 184.580 253.320 40226576
h128 e128 lr20.0 sgd tieE 220.070 280.120 3403536
h64 e64 lr20.0 sgd tieE 226.710 285.820 916240
h256 e256 lr20.0 sgd tieE 212.060 292.060 19412752
h128 e128 lr5.0 sgd tieE 244.610 302.290 3403536
h64 e64 lr5.0 sgd tieE 252.730 305.990 916240
h32 e32 lr20.0 sgd tieE 269.470 306.030 363792
h256 e256 lr5.0 sgd tieE 229.820 312.500 19412752
h32 e32 lr5.0 sgd tieE 271.350 316.910 363792
h512 e512 lr20.0 sgd 181.520 385.340 144729872
h32 e32 lr0.01 adam tieE 1230.920 1221.500 363792
h64 e64 lr0.01 adam tieE 1237.710 1228.120 916240
h128 e128 lr0.01 adam tieE 1240.910 1231.490 3403536
h256 e256 lr0.01 adam tieE 1242.370 1233.020 19412752
h32 e32 lr0.001 adam tieE 4430.130 4396.860 363792
h64 e64 lr0.001 adam tieE 4495.960 4461.420 916240
h128 e128 lr0.001 adam tieE 4517.340 4483.320 3403536
h256 e256 lr0.001 adam tieE 4530.020 4496.120 19412752
Also, obviously, initialization matters, so I should probably be running multiple of each ...
LSTM hyperparameter tuning so far:
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop06 51.860 80.610
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop07 65.560 80.860
lstm.ramsprop.drop065.dim650.lr0.001.trshdecay10.wdecay1e5 63.460 81.300
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop055 45.070 82.100
lstm.ramsprop.drop06.dim650.lr0.001.trshdecay10.wdecay1e5 59.680 82.280
lstm.ramsprop.drop07.dim650.lr0.001.trshdecay10.wdecay1e5 70.720 82.710
lstm.sgd.drop0.dim650.lr20.trshdecay4.drop05 41.540 82.860
lstm.ramsprop.drop055.dim650.lr0.001.trshdecay10.wdecay1e5 55.380 82.890
lstm.ramsprop.drop5.dim650.lr0.001.trshdecay10.wdecay2e5 49.810 83.020
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop05 44.390 83.760
lstm.ramsprop.drop05.dim650.lr0.001.trshdecay10.wdecay1e5 48.810 83.800
lstm.ramsprop.drop6.dim650.lr0.001.trshdecay10.wdecay2e5 65.260 84.390
lstm.ramsprop.drop7.dim650.lr0.001.trshdecay10.wdecay2e5 77.870 87.010
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop04 40.210 87.650
lstm.ramsprop.drop04.dim650.lr0.001.trshdecay10.wdecay1e5 42.690 88.770
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop03 39.620 92.280
lstm.sgd.drop0.dim650.lr40.trshdecay4.drop05 56.680 92.740
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop02 36.440 97.980
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop02 69.020 101.490
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay0.0001 69.130 101.710
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop03 74.870 101.780
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop01 63.840 102.320
lstm.ramsprop.drop02.dim650.lr0.002.trshdecay10.wdecay0.0001 71.120 102.530
lstm.ramsprop.drop02.dim650.lr0.0005.trshdecay10.wdecay0.0001 74.090 104.320
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay1e5 36.840 104.590
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop04 86.900 105.650
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.clip5 54.850 107.180
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay5e5 37.610 107.230
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001 58.870 107.350
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop05 94.910 108.110
lstm.ramsprop.drop0.dim650.lr0.005.trshdecay10.wdecay0.0001 58.060 108.250
lstm.ramsprop.drop0.dim650.lr0.005.trshdecay5.wdecay0.0001 64.370 109.570
lstm.ramsprop.drop0.dim650.lr0.01.trshdecay10.wdecay0.0001 65.930 110.360
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.batch64 56.940 111.580
lstm.sgd.drop0.dim650.lr10.trshdecay4 32.690 113.290
lstm.sgd.drop0.dim650.lr20.trshdecay4 26.180 113.380
lstm.sgd.drop0.dim650.lr5.trshdecay4 43.290 117.210
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay2e5 33.050 117.960
lstm.sgd.drop0.dim650.lr5.trshdecay2 38.950 122.210
lstm.sgd.drop0.dim650.lr40.trshdecay4 21.380 125.470
lstm.sgd.drop0.dim650.lr1.fixeddecay1.2 64.930 127.510
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay1e5 28.170 129.990
lstm.sgd.drop0.dim650.lr1.fixeddecay1.4 85.260 130.200
lstm.sgd.drop0.dim650.lr1.fixeddecay1.6 93.580 133.190
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay1e6 29.150 146.940
lstm.sgd.drop0.dim650.lr0.5.fixeddecay1.2 113.610 149.090
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop065 183.900 151.320
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10 28.840 153.140
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay5e4 152.550 177.570
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay0.0005 175.530 184.580
lstm.sgd.drop0.dim650.lr20.trshdecay4.drop06 189.330 203.920
HMM experiments for Yonatan:
Models without feeding, hidden dim 900:
--type hmm --feeding none --type hmm-g --feeding none --type hmm+1 --feeding none # delayed transition softmax [Yonatan's concatenation model] --type hmm+2 --feeding none # delayed emission softmax
Models with feeding, hidden dim 200:
--type hmm --feeding word --type hmm+1 --feeding word # delayed transition softmax
Optimization settings: (will pick which ones to try with dropout for next iteration)
SGD, lr [5, 10, 20], dropout 0
python ptb_main.py --type hmm --optim sgd --lr [10] --lr-decay-rate 4.0 --clip 0.25 --dropout 0.0 --tie-embeddings --hidden-dim [200] --embed-dim [200] --initrange 0.1 --patience 5
RAMSProp, lr [0.001, 0.002], weight-decay [0, 1e-5, 1e-4], dropout 0
python ptb_main.py --type hmm --optim ramsprop --lr [0.001] --clip 5.0 --dropout 0.0 --tie-embeddings --hidden-dim [200] --embed-dim [200] --initrange 0.8 --batch-size 32 --lr-decay-rate 10.0 --weight-decay [1e-5] --patience 5
RNN results
elman.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5 50.240 87.270
elman.ramsprop.drop0.55.dim850.lr0.002.trshdecay10.wdecay1e5 51.510 89.900
elman.ramsprop.drop0.45.dim850.lr0.002.trshdecay10.wdecay1e5 61.900 90.910
elman.ramsprop.drop0.4.dim850.lr0.001.trshdecay10.wdecay1e5 57.470 91.270
elman.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5 51.760 91.870
elman.ramsprop.drop0.45.dim850.lr0.001.trshdecay10.wdecay1e5 66.130 93.590
elman.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e5 79.390 93.700
elman.ramsprop.drop0.2.dim850.lr0.001.trshdecay10.wdecay1e5 43.000 94.360
elman.ramsprop.drop0.5.dim850.lr0.001.trshdecay10.wdecay1e5 72.690 94.890
elman.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e5 50.600 95.370
elman.ramsprop.drop0.55.dim850.lr0.001.trshdecay10.wdecay1e5 80.950 97.670
elman.ramsprop.drop0.65.dim850.lr0.001.trshdecay10.wdecay1e5 91.650 99.370
elman.ramsprop.drop0.65.dim850.lr0.002.trshdecay10.wdecay1e5 93.950 101.200
elman.ramsprop.drop0.6.dim850.lr0.001.trshdecay10.wdecay1e5 90.720 101.400
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 30.370 107.750
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5 38.000 115.910
elman.sgd.drop0.dim650.lr5.trshdecay4 51.730 117.180
elman.sgd.drop0.dim650.lr10.trshdecay4 56.400 121.700
elman.sgd.drop0.dim650.lr20.trshdecay4 62.840 125.920
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4 127.750 149.610
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4 138.810 156.300
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10 47.980 172.630
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10 41.680 175.450
rnn-3.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5 77.130 107.450
rnn-3.ramsprop.drop0.25.dim900.lr0.002.trshdecay10.wdecay1e5 88.720 110.150
rnn-3.ramsprop.drop0.3.dim900.lr0.002.trshdecay10.wdecay1e5 95.750 110.940
rnn-3.ramsprop.drop0.35.dim900.lr0.002.trshdecay10.wdecay1e5 107.670 116.170
rnn-3.ramsprop.drop0.4.dim900.lr0.002.trshdecay10.wdecay1e5 118.020 121.320
rnn-3.ramsprop.drop0.5.dim900.lr0.002.trshdecay10.wdecay1e5 146.520 136.750
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5 44.290 150.560
rnn-3.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e5 178.910 157.940
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e6 55.160 182.200
rnn-3.ramsprop.drop0.dim850.lr0.001.trshdecay10 56.930 199.040
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e4 180.850 200.780
rnn-2.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5 113.950 162.410
rnn-2.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5 106.110 162.420
rnn-2.ramsprop.drop0.3.dim850.lr0.002.trshdecay10.wdecay1e5 99.400 163.340
rnn-2.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e5 129.020 163.960
rnn-2.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e5 88.580 165.420
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 72.740 171.590
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4 134.840 176.310
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4 136.830 177.530
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5 66.690 177.860
rnn-2.sgd.drop0.dim650.lr5.trshdecay4 109.690 185.590
rnn-2.sgd.drop0.dim650.lr10.trshdecay4 100.200 187.130
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 68.320 188.670
rnn-2.sgd.drop0.dim650.lr20.trshdecay4 90.160 193.880
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10 73.230 217.000
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10 67.540 246.120
rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e7 201.350 207.950
rnn-1.ramsprop.drop03.dim850.lr0.002.trshdecay10.wdecay1e7 217.670 212.290
rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e8 203.400 213.140
rnn-1.ramsprop.drop01.dim850.lr0.002.trshdecay10.wdecay1e7 205.270 222.730
rnn-1.ramsprop.drop04.dim850.lr0.002.trshdecay10.wdecay1e7 244.100 224.720
rnn-1.ramsprop.drop0.1.dim850.lr0.002.trshdecay10.wdecay1e6 224.240 231.270
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e7 201.010 239.510
rnn-1.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e6 253.770 243.470
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 188.770 244.360
rnn-1.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e6 284.770 252.610
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10 196.990 257.260
rnn-1.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e6 319.770 270.500
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10 226.140 276.510
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5 273.520 286.250
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 283.090 296.470
rnn-1.sgd.drop0.dim650.lr10.trshdecay4 270.320 298.840
rnn-1.sgd.drop0.dim650.lr20.trshdecay4 299.870 325.980
rnn-1.sgd.drop0.dim650.lr5.trshdecay4 342.880 357.980
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4 454.190 450.990
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e8 517.340 452.720
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4 455.860 452.910
rrnn-r.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5 56.370 88.910
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop06 63.010 94.580
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop06 59.960 95.310
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop05 47.610 96.040
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop05 52.360 96.210
rrnnr.sgd.drop0.65.dim800.lr20.trshdecay4 74.140 97.210
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop04 42.960 97.990
rrnn-r.ramsprop.drop0.4.dim800.lr0.002.trshdecay10.wdecay1e5 34.230 100.310
rrnnr.sgd.drop0.7.dim800.lr20.trshdecay4 87.470 101.660
rrnnr.sgd.drop0.7.dim800.lr10.trshdecay4 86.100 101.770
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop02 34.600 105.130
rrnn-r.ramsprop.drop0.2.dim800.lr0.002.trshdecay10.wdecay1e5 30.120 113.670
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop0 32.440 116.260
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop0 25.340 122.900
rrnn-r.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5 19.470 142.670
rrnn-r.ramsprop.drop0.dim800.lr0.001.trshdecay10.wdecay1e5 28.810 143.550
rrnn-r.ramsprop.drop0.dim800.lr0.001.trshdecay10.wdecay1e6 33.380 202.980
rrnn-r.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6 25.400 207.210
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5 27.530 123.440
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5 101.680 154.060
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6 56.420 157.720
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10 58.650 169.210
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6 34.490 176.220
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10 44.530 189.520
hmm+1 none h900 e900 lr0.001 drop0.0 ramsprop wd0.0001 pat5 tieE 767.200 753.080
hmm+1 none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 208.090 287.000
hmm+1 none h900 e900 lr0.001 drop0.0 ramsprop wd1e-05 pat5 tieE 691.980 682.840
hmm+1 none h900 e900 lr0.002 drop0.0 ramsprop wd0.0001 pat5 tieE 767.190 753.070
hmm+1 none h900 e900 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 229.960 302.830
hmm+1 none h900 e900 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE 691.770 682.950
hmm+1 none h900 e900 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 686.280 679.040
hmm+1 none h900 e900 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 685.630 679.800
hmm+1 none h900 e900 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 686.420 678.950
hmm-g none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 195.910 243.510
hmm-g none h900 e900 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 204.070 258.960
hmm-g none h900 e900 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 686.420 678.950
hmm-g none h900 e900 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 686.280 679.040
hmm-g none h900 e900 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 685.630 679.790
hmm-g none h900 e900 lr0.001 drop0.0 ramsprop wd1e-05 pat5 tieE 691.740 682.640
hmm-g none h900 e900 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE 691.710 683.000
hmm-g none h900 e900 lr0.001 drop0.0 ramsprop wd0.0001 pat5 tieE 767.210 752.880
hmm-g none h900 e900 lr0.002 drop0.0 ramsprop wd0.0001 pat5 tieE 767.270 752.900
hmm none h900 e900 lr0.001 drop0.0 ramsprop wd0.0001 pat5 tieE 767.200 753.090
hmm none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 246.080 304.090
hmm none h900 e900 lr0.001 drop0.0 ramsprop wd1e-05 pat5 tieE 691.980 682.840
hmm none h900 e900 lr0.002 drop0.0 ramsprop wd0.0001 pat5 tieE 767.180 753.060
hmm none h900 e900 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 251.740 302.320
hmm none h900 e900 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE 691.770 682.670
hmm none h900 e900 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 686.280 679.030
hmm none h900 e900 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 685.630 679.800
hmm none h900 e900 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 686.420 678.950
hmm word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 210.630 288.150
hmm word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 216.020 290.370
hmm word h200 e200 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 217.320 296.750
hmm word h200 e200 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE 536.630 540.850
hmm word h200 e200 lr0.001 drop0.0 ramsprop wd1e-05 pat5 tieE 545.310 549.090
hmm word h200 e200 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 628.570 613.540
hmm word h200 e200 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 711.100 665.220
hmm word h200 e200 lr0.002 drop0.0 ramsprop wd0.0001 pat5 tieE 729.040 714.950
hmm word h200 e200 lr0.001 drop0.0 ramsprop wd0.0001 pat5 tieE 728.950 714.960
hmm+1 word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 327.890 351.530
hmm+1 word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 353.800 369.610
hmm+1 word h200 e200 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE 410.490 422.730
hmm+1 word h200 e200 lr0.001 drop0.0 ramsprop wd1e-05 pat5 tieE 455.770 464.790
hmm+1 word h200 e200 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 543.270 503.110
hmm+1 word h200 e200 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE 1486.530 978.320
hmm+1 word h200 e200 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 2912.140 1244.470
hmm+1 word h200 e200 lr0.001 drop0.0 ramsprop wd0.0001 pat5 tieE 5578.750 2748.370
[no feeding] hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10 233.220 284.590 hmm-new-c.ramsprop.drop0.dim900.lr0.002.trshdecay10 245.420 288.620 hmm-new.ramsprop.drop0.1.dim900.lr0.002.trshdecay10 238.860 291.480 hmm-new-rnn-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10 202.570 299.580 hmm-new-elman-hmm-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10 325.140 343.040 hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e6 564.020 570.660 hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5 691.710 682.550
still waiting on a couple of jobs
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 204.110 292.960
hmm-new-rnn-emit none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 319.740 367.000
hmm-new-tensor-feed word h200 e200 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 610.220 599.140
hmm-new-tensor-feed word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 218.170 287.910
hmm-new-tensor-feed word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 229.850 298.310
hmm-new-tensor-feed word h200 e200 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 226.080 299.280
hmm-new-gate-feed word h800 e800 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 259.580 318.960
hmm-new-add-feed word h800 e800 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 304.400 343.220
Minor note -- HMM init is typically really important (i'm just thinking of Baum-Welch runs I've done) so I'm wondering if it makes sense to do a couple of runs of our best settings.
Just an update of above and three really shitty attempts at elman w/ a non-softmax norm
hmm-new-tensor-feed word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 218.030 287.880
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 pat5 delayemit 194.030 289.120
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 204.110 292.960
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 delaytrans-emit 205.130 293.720
hmm-new-tensor-feed word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 229.850 298.310
hmm-new-tensor-feed word h200 e200 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 226.080 299.280
hmm-new-gate-feed word h800 e800 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 253.390 316.310
hmm-new-add-feed word h800 e800 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 304.400 343.220
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 pat5 delaytrans 291.690 353.880
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd0.0 delaynone 352.540 395.400
hmm-new-tensor-feed word h200 e200 lr0.002 drop0.0 ramsprop wd0.0 pat5 tieE 610.220 599.140
hmm-new none h850 e850 lr0.002 drop0.0 ramsprop wd1e-05 pat5 delaynone 473.360 473.060
elman-normExp word h850 e850 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE delaynone 283.310 298.700
elman-norm word h850 e850 lr0.002 drop0.0 ramsprop wd1e-05 pat5 tieE delaynone 317.380 325.810
elman-norm word h850 e850 lr0.002 drop0.2 ramsprop wd1e-05 pat5 tieE delaynone 357.540 328.290
Newish Elman results:
elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 38.710 161.680
elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 68.780 188.460
elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10 63.850 210.350
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 212.460 297.610
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10 226.880 311.570
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 289.050 321.570
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 314.830 340.880
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10 329.230 356.820
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 360.330 367.570
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 471.440 491.570
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10 495.220 522.780
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 569.100 573.180
elman-mult word h850 e850 lr0.002 drop0.6 ramsprop wd1e-05 pat5 tieE 74.820 100.710
elman-mult word h850 e850 lr0.002 drop0.4 ramsprop wd1e-05 pat5 tieE 54.130 104.690
elman-decayed word h850 e850 lr0.002 drop0.3 ramsprop wd1e-06 pat5 tieE 38.000 113.330
elman-mult word h850 e850 lr0.002 drop0.2 ramsprop wd1e-05 pat5 tieE 40.090 113.680
elman-mult word h850 e850 lr0.002 drop0.5 ramsprop wd1e-05 pat5 tieE 91.350 118.760
elman-decayed word h850 e850 lr0.002 drop0.2 ramsprop wd1e-06 pat5 tieE 39.750 122.520
elman-decayed word h850 e850 lr0.002 drop0.3 ramsprop wd1e-07 pat5 tieE 51.970 126.830
elman-decayed word h850 e850 lr0.002 drop0.1 ramsprop wd1e-06 pat5 tieE 38.760 135.410
elman-decayed word h850 e850 lr0.002 drop0.2 ramsprop wd1e-07 pat5 tieE 46.930 137.920
elman-decayed word h850 e850 lr0.002 drop0.1 ramsprop wd1e-07 pat5 tieE 46.110 150.940
elman-softmax-hmm-emit word h850 e850 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 271.850 312.630
elman-softmax-hmm-emit word h850 e850 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 271.770 312.960
elman-softmax-hmm-emit word h850 e850 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 332.420 360.680
my machines crashed it seems so this is what I've got at the moment. need to go to campus to restart my desktop. Will spawn lstm runs asap
elman-softmax-hmm-emit word h250 e250 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 248.040 290.880 0.125
elman-softmax-hmm-emit word h250 e250 lr20.0 drop0.0 sgd wd0.0 pat5 255.290 308.170 0.320
elman-softmax-hmm-emit word h250 e250 lr10.0 drop0.0 sgd wd0.0 pat5 294.030 333.530 0.153
elman-softmax-hmm-emit word h250 e250 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 295.200 333.350 0.153
elman-softmax-hmm-emit word h250 e250 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 316.240 352.220 0.153
elman-softmax-hmm-emit word h250 e250 lr5.0 drop0.0 sgd wd0.0 pat5 296.880 336.210 0.153
elman-softmax-hmm-emit word h850 e850 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 271.850 312.630 0.153
elman-softmax-hmm-emit word h850 e850 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 271.770 312.960 0.153
elman-softmax-hmm-emit word h850 e850 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 332.420 360.680 0.153
elman-mult word h850 e850 lr0.002 drop0.6 ramsprop wd1e-05 pat5 tieE 74.820 100.710 0.428
elman-mult word h850 e850 lr0.002 drop0.4 ramsprop wd1e-05 pat5 tieE 54.130 104.690 0.427
elman-mult word h850 e850 lr0.002 drop0.2 ramsprop wd1e-05 pat5 tieE 40.090 113.680 0.428
elman-mult word h850 e850 lr0.002 drop0.5 ramsprop wd1e-05 pat5 tieE 91.350 118.760 0.417
elman-delayed word h850 e850 lr0.002 drop0.3 ramsprop wd1e-07 pat5 tieE 201.130 277.540 0.346
elman-delayed word h850 e850 lr0.002 drop0.2 ramsprop wd1e-06 pat5 tieE 213.840 287.570 0.337
elman-delayed word h850 e850 lr0.002 drop0.3 ramsprop wd1e-06 pat5 tieE 220.950 288.450 0.341
elman-delayed word h850 e850 lr0.002 drop0.1 ramsprop wd1e-06 pat5 tieE 229.780 297.190 0.339
elman-delayed word h850 e850 lr0.002 drop0.1 ramsprop wd1e-07 pat5 tieE 229.310 303.060 0.344
elman-delayed word h850 e850 lr0.002 drop0.2 ramsprop wd1e-07 pat5 tieE 344.420 397.500 0.275
hmm-new-sigmoid.delay-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10 65.530 142.310
hmm-new-sigmoid.ramsprop.drop0.dim900.lr0.002.trshdecay10 179.060 240.910
elman-softmax-single-mult-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10 377.220 425.840
elman-single.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 485.260 495.370
elman-single.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 499.960 515.920
elman-softmax-single-mult-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 553.270 555.720
rrnn-1.ramsprop.drop0.5.dim800.lr0.002.trshdecay10.wdecay1e5 74.790 97.690
rrnn-1.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5 94.390 104.760
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5 27.530 123.440
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6 34.490 176.220
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10 44.530 189.520
elman-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6 275.410 313.840
elman-norm.ramsprop.drop0.5.dim850.lr0.002.trshdecay10 167.800 225.360
elman-norm.ramsprop.drop0.2.dim850.lr0.002.trshdecay10 169.630 229.180
elman-norm.ramsprop.drop0.dim850.lr0.002.trshdecay10 187.590 237.260
elman-norm.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5 268.720 290.620
hmm-new-sigmoid.ramsprop.drop0.dim250.lr0.002.trshdecay10 205.290 268.120
hmm-new.ramsprop.drop0.dim250.lr0.002.trshdecay10 287.740 339.170
hmm-new-sigmoid.ramsprop.drop0.dim250.lr0.002.trshdecay10.wdecay1e5 644.780 636.270
hmm-new.ramsprop.drop0.dim250.lr0.002.trshdecay10.wdecay1e5 647.280 642.660
bigram.ramsprop.drop0.4.dim900.lr0.002.trshdecay10.wdecay1e5 128.590 177.270
bigram.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e5 148.610 177.990
bigram.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5 112.080 178.600
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5 95.540 183.490
bigram.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e6 182.680 196.970
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e4 216.920 235.660
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10 87.510 243.000
These crashed so grain of salt but very similar performance with hidden dim of 100
elman-softmax-hmm-emit word h250 e250 lr20.0 drop0.0 sgd wd0.0 pat5 255.290 308.170 0.320
elman-softmax-hmm-emit word h850 e850 lr20.0 drop0.0 sgd wd0.0 pat5 tieE 271.850 312.630 0.153
elman-softmax-hmm-emit word h850 e850 lr10.0 drop0.0 sgd wd0.0 pat5 tieE 271.770 312.960 0.153
elman-softmax-hmm-emit word h100 e100 lr20.0 drop0.0 sgd wd0.0 pat5 275.240 320.580 0.304
elman-softmax-hmm-emit word h100 e100 lr5.0 drop0.0 sgd wd0.0 pat5 291.910 329.760 0.298
elman-softmax-hmm-emit word h250 e250 lr10.0 drop0.0 sgd wd0.0 pat5 294.030 333.530 0.153
elman-softmax-hmm-emit word h250 e250 lr5.0 drop0.0 sgd wd0.0 pat5 296.880 336.210 0.153
elman-softmax-hmm-emit word h850 e850 lr5.0 drop0.0 sgd wd0.0 pat5 tieE 332.420 360.680 0.153
Models
TODO Max dim always 1024
LSTM --@janmbuys Tuning -- No Dropout -- SGD (two strategies) -- Dims -- LRs
Implement the shit above
LogSpace HMM -- made a tweak, it now seems to be getting ppl's very close to prob space.
GridSearch -- Try and overfit
Total parameter calculation - implemented