ybisk / HMM-RNN

2 stars 0 forks source link

Full Work List #4

Open ybisk opened 6 years ago

ybisk commented 6 years ago

Models

  1. LSTM
  2. RAN -- Implemented simplified version (RRNN)
  3. Elman (sigmoid)
  4. Elman (softmax)
  5. Elman (early softmax) -- decomposed hidden+input
  6. HMM (delayed emission) -- Implemented. but due to tensor expands using lots of GPU memory can only use small hidden state size (up to 150).
  7. HMM (delayed transition)
  8. HMM (still w/ word cond)
  9. HMM (vanilla)

TODO Max dim always 1024

  1. LSTM --@janmbuys Tuning -- No Dropout -- SGD (two strategies) -- Dims -- LRs

  2. Implement the shit above

  3. LogSpace HMM -- made a tweak, it now seems to be getting ppl's very close to prob space.

  4. GridSearch -- Try and overfit

    • [ ] Elman (3) -- @ybisk attempting to tune
    • [ ] Elman (4)
    • [ ] Elman (5)
    • [ ] HMM (6)
    • [ ] HMM (7)
    • [ ] HMM (8)
    • [ ] HMM (9)
  5. Total parameter calculation - implemented

ybisk commented 6 years ago

Elman (sigmoid) Parameter Sweep

hidden = 1024
dropout = 0.0
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log         50.130   133.690
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log              57.430   140.930
elman_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log               57.510   141.220
elman_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log               59.860   143.500
elman_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log               63.590   144.990
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log       2.480   145.830
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log        55.160   148.420
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log     43.470   150.940
elman_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log              66.780   155.810
elman_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log              63.050   156.590
elman_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log              59.040   158.820
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log             64.950   158.860
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log            1.730   161.330
elman_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log              91.840   161.700
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log          34.010   161.930
elman_word_l35_h1024_e512_lr0.001_drop0.0_adam.log             1.960   165.560
elman_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log               94.360   166.070
elman_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log           40.790   166.680
elman_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log               96.020   168.460
elman_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log               96.310   168.630
elman_word_l35_h1024_e256_lr0.001_drop0.0_adam.log             2.510   170.890
elman_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log           48.130   174.140
elman_word_l35_h1024_e128_lr0.001_drop0.0_adam.log             3.470   179.980
elman_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log           55.870   183.220
janmbuys commented 6 years ago

Just because one can never have enough optimization methods, I added "RAMSProp", which is the "fixed" version of Adam, together with beta_1=0, which makes it similar to RMSprop and which has been shown to be better for LMing. Together with LR decay (which is by default not used for Adam), this is more stable and doesn't overfit like Adam.

Another fun fact is that some published models have a parameter budget of 10M, which corresponds to a 1-layer LSTM with tied embeddings and hidden size 650, or Elman etc with hidden size 850. This seems to be roughly the minimum model capacity required to be able to overfit (and get competitive performance with regularization).

ybisk commented 6 years ago

Sorry, so should we run with that instead of 1024? Here's a buttload of numbers

jordan_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log        77.250   150.190
jordan_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log             71.750   152.530
jordan_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log       65.340   152.560
jordan_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log            69.440   153.230
jordan_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log              75.510   153.270
jordan_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log              78.620   153.520
jordan_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log             72.740   153.990
jordan_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log             69.100   154.300
jordan_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log              78.400   154.650
jordan_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log             77.210   156.930
jordan_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log             79.030   156.930
jordan_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log              83.850   158.420
jordan_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log              84.210   159.380
jordan_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log              84.630   160.190
jordan_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log    846.360   835.070
jordan_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log         876.200   865.250
jordan_word_l35_h1024_e512_lr0.001_drop0.0_adam.log          881.430   866.270
jordan_word_l35_h1024_e256_lr0.001_drop0.0_adam.log          893.580   879.730
jordan_word_l35_h1024_e128_lr0.001_drop0.0_adam.log          896.610   882.810
jordan_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log         6108.980   6077.580
jordan_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log         6625.260   6593.210
jordan_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log         6631.050   6601.740
jordan_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log   6782.500   6747.610
jordan_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log        6781.270   6754.370

elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log         49.700   116.760
elman_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log               58.140   123.030
elman_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log              60.610   123.070
elman_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log               56.790   123.680
elman_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log               62.400   125.840
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log        61.960   128.110
elman_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log             65.910   134.800
elman_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log              66.970   136.010
elman_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log              60.850   136.350
elman_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log              66.900   136.720
elman_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log               94.880   144.160
elman_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log               96.840   146.340
elman_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log               96.580   146.840
elman_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log             153.680   184.130
elman_word_l35_h1024_e256_lr0.001_drop0.0_adam.log           826.450   802.270
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log     826.680   804.160
elman_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log          826.510   804.180
elman_word_l35_h1024_e128_lr0.001_drop0.0_adam.log           830.860   812.240
elman_word_l35_h1024_e512_lr0.001_drop0.0_adam.log           833.050   813.240
elman_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log          1993.090   1963.810
elman_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log          1964.530   1973.200
elman_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log          2023.590   2026.800
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log         2066.940   2052.280
elman_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log    2068.670   2053.380

rnn-1_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log             262.140   296.410
rnn-1_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log       278.280   309.290
rnn-1_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log             321.020   342.120
rnn-1_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log             336.080   353.360
rnn-1_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log            351.160   371.310
rnn-1_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log        367.230   380.750
rnn-1_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log             376.830   394.500
rnn-1_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log              384.170   397.680
rnn-1_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log              383.760   400.280
rnn-1_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log              407.690   422.080
rnn-1_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log             537.050   531.540
rnn-1_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log              643.040   632.830
rnn-1_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log              686.860   678.760
rnn-1_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log              687.050   678.980
rnn-1_word_l35_h1024_e512_lr0.001_drop0.0_adam.log           4537.210   4503.220
rnn-1_word_l35_h1024_e256_lr0.001_drop0.0_adam.log           4538.510   4504.380
rnn-1_word_l35_h1024_e128_lr0.001_drop0.0_adam.log           4538.560   4504.560
rnn-1_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log          4538.730   4504.660
rnn-1_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log     4538.750   4504.670
rnn-1_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log          9222.730   9215.350
rnn-1_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log          9225.590   9217.940
rnn-1_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log          9225.760   9218.380
rnn-1_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log         9226.020   9218.500
rnn-1_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log    9226.050   9218.530

rnn-2_word_l35_h1024_e1024_lr1.0_drop0.0_sgd.log             107.220   186.360
rnn-2_word_l35_h1024_e512_lr1.0_drop0.0_sgd.log              111.650   186.970
rnn-2_word_l35_h1024_e1024_lr5.0_drop0.0_sgd_tieE.log        108.160   187.510
rnn-2_word_l35_h1024_e128_lr1.0_drop0.0_sgd.log              114.970   187.680
rnn-2_word_l35_h1024_e512_lr5.0_drop0.0_sgd.log              105.210   187.730
rnn-2_word_l35_h1024_e256_lr5.0_drop0.0_sgd.log              108.930   187.840
rnn-2_word_l35_h1024_e256_lr1.0_drop0.0_sgd.log              111.310   188.080
rnn-2_word_l35_h1024_e1024_lr5.0_drop0.0_sgd.log             100.990   188.640
rnn-2_word_l35_h1024_e128_lr5.0_drop0.0_sgd.log              113.470   189.410
rnn-2_word_l35_h1024_e1024_lr20.0_drop0.0_sgd.log             93.210   191.760
rnn-2_word_l35_h1024_e128_lr20.0_drop0.0_sgd.log             109.500   192.180
rnn-2_word_l35_h1024_e256_lr20.0_drop0.0_sgd.log              96.040   194.690
rnn-2_word_l35_h1024_e1024_lr20.0_drop0.0_sgd_tieE.log        88.460   195.520
rnn-2_word_l35_h1024_e512_lr20.0_drop0.0_sgd.log              86.500   200.300
rnn-2_word_l35_h1024_e1024_lr0.001_drop0.0_adam_tieE.log     854.610   843.080
rnn-2_word_l35_h1024_e1024_lr0.001_drop0.0_adam.log          876.520   864.850
rnn-2_word_l35_h1024_e256_lr0.001_drop0.0_adam.log           882.110   866.610
rnn-2_word_l35_h1024_e512_lr0.001_drop0.0_adam.log           887.910   872.900
rnn-2_word_l35_h1024_e128_lr0.001_drop0.0_adam.log           893.210   879.030
rnn-2_word_l35_h1024_e128_lr0.0001_drop0.0_adam.log          6344.320   6307.890
rnn-2_word_l35_h1024_e256_lr0.0001_drop0.0_adam.log          6482.970   6443.870
rnn-2_word_l35_h1024_e512_lr0.0001_drop0.0_adam.log          6781.020   6749.900
rnn-2_word_l35_h1024_e1024_lr0.0001_drop0.0_adam_tieE.log    6831.720   6795.480
rnn-2_word_l35_h1024_e1024_lr0.0001_drop0.0_adam.log         6856.410   6823.910
ybisk commented 6 years ago

I think I need to shrink the hidden dim for the HMM runs. Maybe I should have started with those, but it seems like --batch-size 5 --hidden-dim 512 seems to be running ok where previously I used --batch-size 20 --hidden-dim 1024

ybisk commented 6 years ago

these are mostly tied embeddings but I had one long run in there beforehand.
HMM

                                                     Train       Valid        Parameters
h512     e128     lr20.0   sgd                       184.580     253.320      40226576
h128     e128     lr20.0   sgd      tieE             220.070     280.120       3403536
h64      e64      lr20.0   sgd      tieE             226.710     285.820        916240
h256     e256     lr20.0   sgd      tieE             212.060     292.060      19412752
h128     e128     lr5.0    sgd      tieE             244.610     302.290       3403536
h64      e64      lr5.0    sgd      tieE             252.730     305.990        916240
h32      e32      lr20.0   sgd      tieE             269.470     306.030        363792
h256     e256     lr5.0    sgd      tieE             229.820     312.500      19412752
h32      e32      lr5.0    sgd      tieE             271.350     316.910        363792
h512     e512     lr20.0   sgd                       181.520     385.340     144729872
h32      e32      lr0.01   adam     tieE            1230.920    1221.500        363792
h64      e64      lr0.01   adam     tieE            1237.710    1228.120        916240
h128     e128     lr0.01   adam     tieE            1240.910    1231.490       3403536
h256     e256     lr0.01   adam     tieE            1242.370    1233.020      19412752
h32      e32      lr0.001  adam     tieE            4430.130    4396.860        363792
h64      e64      lr0.001  adam     tieE            4495.960    4461.420        916240
h128     e128     lr0.001  adam     tieE            4517.340    4483.320       3403536
h256     e256     lr0.001  adam     tieE            4530.020    4496.120      19412752
ybisk commented 6 years ago

Also, obviously, initialization matters, so I should probably be running multiple of each ...

janmbuys commented 6 years ago

LSTM hyperparameter tuning so far:


lstm.sgd.drop0.dim650.lr10.trshdecay4.drop06                  51.860    80.610
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop07                  65.560    80.860
lstm.ramsprop.drop065.dim650.lr0.001.trshdecay10.wdecay1e5    63.460    81.300
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop055                 45.070    82.100
lstm.ramsprop.drop06.dim650.lr0.001.trshdecay10.wdecay1e5     59.680    82.280
lstm.ramsprop.drop07.dim650.lr0.001.trshdecay10.wdecay1e5     70.720    82.710
lstm.sgd.drop0.dim650.lr20.trshdecay4.drop05                  41.540    82.860
lstm.ramsprop.drop055.dim650.lr0.001.trshdecay10.wdecay1e5    55.380    82.890
lstm.ramsprop.drop5.dim650.lr0.001.trshdecay10.wdecay2e5      49.810    83.020
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop05                  44.390    83.760
lstm.ramsprop.drop05.dim650.lr0.001.trshdecay10.wdecay1e5     48.810    83.800
lstm.ramsprop.drop6.dim650.lr0.001.trshdecay10.wdecay2e5      65.260    84.390
lstm.ramsprop.drop7.dim650.lr0.001.trshdecay10.wdecay2e5      77.870    87.010
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop04                  40.210    87.650
lstm.ramsprop.drop04.dim650.lr0.001.trshdecay10.wdecay1e5     42.690    88.770
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop03                  39.620    92.280
lstm.sgd.drop0.dim650.lr40.trshdecay4.drop05                  56.680    92.740
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop02                  36.440    97.980
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop02  69.020   101.490
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay0.0001  69.130   101.710
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop03  74.870   101.780
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop01  63.840   102.320
lstm.ramsprop.drop02.dim650.lr0.002.trshdecay10.wdecay0.0001  71.120   102.530
lstm.ramsprop.drop02.dim650.lr0.0005.trshdecay10.wdecay0.0001  74.090   104.320
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay1e5     36.840   104.590
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop04  86.900   105.650
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.clip5  54.850   107.180
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay5e5      37.610   107.230
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001   58.870   107.350
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.drop05  94.910   108.110
lstm.ramsprop.drop0.dim650.lr0.005.trshdecay10.wdecay0.0001   58.060   108.250
lstm.ramsprop.drop0.dim650.lr0.005.trshdecay5.wdecay0.0001    64.370   109.570
lstm.ramsprop.drop0.dim650.lr0.01.trshdecay10.wdecay0.0001    65.930   110.360
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay0.0001.batch64  56.940   111.580
lstm.sgd.drop0.dim650.lr10.trshdecay4                         32.690   113.290
lstm.sgd.drop0.dim650.lr20.trshdecay4                         26.180   113.380
lstm.sgd.drop0.dim650.lr5.trshdecay4                          43.290   117.210
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay2e5      33.050   117.960
lstm.sgd.drop0.dim650.lr5.trshdecay2                          38.950   122.210
lstm.sgd.drop0.dim650.lr40.trshdecay4                         21.380   125.470
lstm.sgd.drop0.dim650.lr1.fixeddecay1.2                       64.930   127.510
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay1e5      28.170   129.990
lstm.sgd.drop0.dim650.lr1.fixeddecay1.4                       85.260   130.200
lstm.sgd.drop0.dim650.lr1.fixeddecay1.6                       93.580   133.190
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay1e6      29.150   146.940
lstm.sgd.drop0.dim650.lr0.5.fixeddecay1.2                    113.610   149.090
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop065                183.900   151.320
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10                28.840   153.140
lstm.ramsprop.drop0.dim650.lr0.001.trshdecay10.wdecay5e4     152.550   177.570
lstm.ramsprop.drop02.dim650.lr0.001.trshdecay10.wdecay0.0005 175.530   184.580
lstm.sgd.drop0.dim650.lr20.trshdecay4.drop06                 189.330   203.920
janmbuys commented 6 years ago

HMM experiments for Yonatan:

Models without feeding, hidden dim 900:

--type hmm --feeding none --type hmm-g --feeding none --type hmm+1 --feeding none # delayed transition softmax [Yonatan's concatenation model] --type hmm+2 --feeding none # delayed emission softmax

Models with feeding, hidden dim 200:

--type hmm --feeding word --type hmm+1 --feeding word # delayed transition softmax

Optimization settings: (will pick which ones to try with dropout for next iteration)

SGD, lr [5, 10, 20], dropout 0

python ptb_main.py --type hmm --optim sgd --lr [10] --lr-decay-rate 4.0 --clip 0.25 --dropout 0.0 --tie-embeddings --hidden-dim [200] --embed-dim [200] --initrange 0.1 --patience 5

RAMSProp, lr [0.001, 0.002], weight-decay [0, 1e-5, 1e-4], dropout 0

python ptb_main.py --type hmm --optim ramsprop --lr [0.001] --clip 5.0 --dropout 0.0 --tie-embeddings --hidden-dim [200] --embed-dim [200] --initrange 0.8 --batch-size 32 --lr-decay-rate 10.0 --weight-decay [1e-5] --patience 5

janmbuys commented 6 years ago

RNN results


elman.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5   50.240    87.270
elman.ramsprop.drop0.55.dim850.lr0.002.trshdecay10.wdecay1e5  51.510    89.900
elman.ramsprop.drop0.45.dim850.lr0.002.trshdecay10.wdecay1e5  61.900    90.910
elman.ramsprop.drop0.4.dim850.lr0.001.trshdecay10.wdecay1e5   57.470    91.270
elman.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5   51.760    91.870
elman.ramsprop.drop0.45.dim850.lr0.001.trshdecay10.wdecay1e5  66.130    93.590
elman.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e5   79.390    93.700
elman.ramsprop.drop0.2.dim850.lr0.001.trshdecay10.wdecay1e5   43.000    94.360
elman.ramsprop.drop0.5.dim850.lr0.001.trshdecay10.wdecay1e5   72.690    94.890
elman.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e5   50.600    95.370
elman.ramsprop.drop0.55.dim850.lr0.001.trshdecay10.wdecay1e5  80.950    97.670
elman.ramsprop.drop0.65.dim850.lr0.001.trshdecay10.wdecay1e5  91.650    99.370
elman.ramsprop.drop0.65.dim850.lr0.002.trshdecay10.wdecay1e5  93.950   101.200
elman.ramsprop.drop0.6.dim850.lr0.001.trshdecay10.wdecay1e5   90.720   101.400
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5     30.370   107.750
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5     38.000   115.910
elman.sgd.drop0.dim650.lr5.trshdecay4                         51.730   117.180
elman.sgd.drop0.dim650.lr10.trshdecay4                        56.400   121.700
elman.sgd.drop0.dim650.lr20.trshdecay4                        62.840   125.920
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4    127.750   149.610
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4    138.810   156.300
elman.ramsprop.drop0.dim850.lr0.001.trshdecay10               47.980   172.630
elman.ramsprop.drop0.dim850.lr0.002.trshdecay10               41.680   175.450

rnn-3.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5    77.130     107.450
rnn-3.ramsprop.drop0.25.dim900.lr0.002.trshdecay10.wdecay1e5    88.720     110.150
rnn-3.ramsprop.drop0.3.dim900.lr0.002.trshdecay10.wdecay1e5    95.750     110.940
rnn-3.ramsprop.drop0.35.dim900.lr0.002.trshdecay10.wdecay1e5   107.670     116.170
rnn-3.ramsprop.drop0.4.dim900.lr0.002.trshdecay10.wdecay1e5   118.020     121.320
rnn-3.ramsprop.drop0.5.dim900.lr0.002.trshdecay10.wdecay1e5   146.520     136.750
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5    44.290     150.560
rnn-3.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e5   178.910     157.940
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e6    55.160     182.200
rnn-3.ramsprop.drop0.dim850.lr0.001.trshdecay10       56.930     199.040
rnn-3.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e4   180.850     200.780

rnn-2.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5  113.950   162.410
rnn-2.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5  106.110   162.420
rnn-2.ramsprop.drop0.3.dim850.lr0.002.trshdecay10.wdecay1e5   99.400   163.340
rnn-2.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e5  129.020   163.960
rnn-2.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e5   88.580   165.420
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5     72.740   171.590
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4    134.840   176.310
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4    136.830   177.530
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5     66.690   177.860
rnn-2.sgd.drop0.dim650.lr5.trshdecay4                        109.690   185.590
rnn-2.sgd.drop0.dim650.lr10.trshdecay4                       100.200   187.130
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6     68.320   188.670
rnn-2.sgd.drop0.dim650.lr20.trshdecay4                        90.160   193.880
rnn-2.ramsprop.drop0.dim850.lr0.002.trshdecay10               73.230   217.000
rnn-2.ramsprop.drop0.dim850.lr0.001.trshdecay10               67.540   246.120

rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e7   201.350   207.950
rnn-1.ramsprop.drop03.dim850.lr0.002.trshdecay10.wdecay1e7   217.670   212.290
rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e8   203.400   213.140
rnn-1.ramsprop.drop01.dim850.lr0.002.trshdecay10.wdecay1e7   205.270   222.730
rnn-1.ramsprop.drop04.dim850.lr0.002.trshdecay10.wdecay1e7   244.100   224.720
rnn-1.ramsprop.drop0.1.dim850.lr0.002.trshdecay10.wdecay1e6  224.240   231.270
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e7    201.010   239.510
rnn-1.ramsprop.drop0.2.dim850.lr0.002.trshdecay10.wdecay1e6  253.770   243.470
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6    188.770   244.360
rnn-1.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e6  284.770   252.610
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10              196.990   257.260
rnn-1.ramsprop.drop0.6.dim850.lr0.002.trshdecay10.wdecay1e6  319.770   270.500
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10              226.140   276.510
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e5    273.520   286.250
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5    283.090   296.470
rnn-1.sgd.drop0.dim650.lr10.trshdecay4                       270.320   298.840
rnn-1.sgd.drop0.dim650.lr20.trshdecay4                       299.870   325.980
rnn-1.sgd.drop0.dim650.lr5.trshdecay4                        342.880   357.980
rnn-1.ramsprop.drop0.dim850.lr0.001.trshdecay10.wdecay1e4    454.190   450.990
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e8    517.340   452.720
rnn-1.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e4    455.860   452.910

rrnn-r.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5  56.370    88.910
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop06                 63.010    94.580
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop06                 59.960    95.310
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop05                 47.610    96.040
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop05                 52.360    96.210
rrnnr.sgd.drop0.65.dim800.lr20.trshdecay4                     74.140    97.210
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop04                 42.960    97.990
rrnn-r.ramsprop.drop0.4.dim800.lr0.002.trshdecay10.wdecay1e5  34.230   100.310
rrnnr.sgd.drop0.7.dim800.lr20.trshdecay4                      87.470   101.660
rrnnr.sgd.drop0.7.dim800.lr10.trshdecay4                      86.100   101.770
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop02                 34.600   105.130
rrnn-r.ramsprop.drop0.2.dim800.lr0.002.trshdecay10.wdecay1e5  30.120   113.670
rrnnr.sgd.drop0.dim800.lr10.trshdecay4.drop0                  32.440   116.260
rrnnr.sgd.drop0.dim800.lr20.trshdecay4.drop0                  25.340   122.900
rrnn-r.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5    19.470   142.670
rrnn-r.ramsprop.drop0.dim800.lr0.001.trshdecay10.wdecay1e5    28.810   143.550
rrnn-r.ramsprop.drop0.dim800.lr0.001.trshdecay10.wdecay1e6    33.380   202.980
rrnn-r.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6    25.400   207.210

rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5    27.530     123.440
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5   101.680     154.060
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6    56.420     157.720
rrnn.ramsprop.drop0.dim800.lr0.002.trshdecay10        58.650     169.210
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6    34.490     176.220
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10      44.530     189.520
ybisk commented 6 years ago
hmm+1    none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0001 pat5     tieE       767.200     753.080
hmm+1    none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0    pat5     tieE       208.090     287.000
hmm+1    none     h900     e900     lr0.001  drop0.0  ramsprop wd1e-05  pat5     tieE       691.980     682.840
hmm+1    none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0001 pat5     tieE       767.190     753.070
hmm+1    none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       229.960     302.830
hmm+1    none     h900     e900     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE       691.770     682.950
hmm+1    none     h900     e900     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       686.280     679.040
hmm+1    none     h900     e900     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       685.630     679.800
hmm+1    none     h900     e900     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       686.420     678.950

hmm-g    none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0    pat5     tieE       195.910     243.510
hmm-g    none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       204.070     258.960
hmm-g    none     h900     e900     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       686.420     678.950
hmm-g    none     h900     e900     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       686.280     679.040
hmm-g    none     h900     e900     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       685.630     679.790
hmm-g    none     h900     e900     lr0.001  drop0.0  ramsprop wd1e-05  pat5     tieE       691.740     682.640
hmm-g    none     h900     e900     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE       691.710     683.000
hmm-g    none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0001 pat5     tieE       767.210     752.880
hmm-g    none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0001 pat5     tieE       767.270     752.900

hmm      none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0001 pat5     tieE       767.200     753.090
hmm      none     h900     e900     lr0.001  drop0.0  ramsprop wd0.0    pat5     tieE       246.080     304.090
hmm      none     h900     e900     lr0.001  drop0.0  ramsprop wd1e-05  pat5     tieE       691.980     682.840
hmm      none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0001 pat5     tieE       767.180     753.060
hmm      none     h900     e900     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       251.740     302.320
hmm      none     h900     e900     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE       691.770     682.670
hmm      none     h900     e900     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       686.280     679.030
hmm      none     h900     e900     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       685.630     679.800
hmm      none     h900     e900     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       686.420     678.950
ybisk commented 6 years ago
hmm      word     h200     e200     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       210.630     288.150
hmm      word     h200     e200     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       216.020     290.370
hmm      word     h200     e200     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       217.320     296.750
hmm      word     h200     e200     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE       536.630     540.850
hmm      word     h200     e200     lr0.001  drop0.0  ramsprop wd1e-05  pat5     tieE       545.310     549.090
hmm      word     h200     e200     lr0.001  drop0.0  ramsprop wd0.0    pat5     tieE       628.570     613.540
hmm      word     h200     e200     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       711.100     665.220
hmm      word     h200     e200     lr0.002  drop0.0  ramsprop wd0.0001 pat5     tieE       729.040     714.950
hmm      word     h200     e200     lr0.001  drop0.0  ramsprop wd0.0001 pat5     tieE       728.950     714.960
ybisk commented 6 years ago
hmm+1    word     h200     e200     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       327.890     351.530
hmm+1    word     h200     e200     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       353.800     369.610
hmm+1    word     h200     e200     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE       410.490     422.730
hmm+1    word     h200     e200     lr0.001  drop0.0  ramsprop wd1e-05  pat5     tieE       455.770     464.790
hmm+1    word     h200     e200     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       543.270     503.110
hmm+1    word     h200     e200     lr0.001  drop0.0  ramsprop wd0.0    pat5     tieE      1486.530     978.320
hmm+1    word     h200     e200     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE      2912.140    1244.470
hmm+1    word     h200     e200     lr0.001  drop0.0  ramsprop wd0.0001 pat5     tieE      5578.750    2748.370
janmbuys commented 6 years ago

[no feeding] hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10 233.220 284.590 hmm-new-c.ramsprop.drop0.dim900.lr0.002.trshdecay10 245.420 288.620 hmm-new.ramsprop.drop0.1.dim900.lr0.002.trshdecay10 238.860 291.480 hmm-new-rnn-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10 202.570 299.580 hmm-new-elman-hmm-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10 325.140 343.040 hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e6 564.020 570.660 hmm-new.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5 691.710 682.550

ybisk commented 6 years ago

still waiting on a couple of jobs

hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       204.110     292.960
hmm-new-rnn-emit    none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       319.740     367.000
hmm-new-tensor-feed word     h200     e200     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       610.220     599.140
hmm-new-tensor-feed word     h200     e200     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       218.170     287.910
hmm-new-tensor-feed word     h200     e200     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       229.850     298.310
hmm-new-tensor-feed word     h200     e200     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       226.080     299.280
hmm-new-gate-feed   word     h800     e800     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       259.580     318.960
hmm-new-add-feed    word     h800     e800     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       304.400     343.220
ybisk commented 6 years ago

Minor note -- HMM init is typically really important (i'm just thinking of Baum-Welch runs I've done) so I'm wondering if it makes sense to do a couple of runs of our best settings.

ybisk commented 6 years ago

Just an update of above and three really shitty attempts at elman w/ a non-softmax norm

hmm-new-tensor-feed word     h200     e200     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       218.030     287.880
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0    pat5    delayemit   194.030     289.120
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       204.110     292.960
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0      delaytrans-emit   205.130     293.720
hmm-new-tensor-feed word     h200     e200     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       229.850     298.310
hmm-new-tensor-feed word     h200     e200     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       226.080     299.280
hmm-new-gate-feed   word     h800     e800     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       253.390     316.310
hmm-new-add-feed    word     h800     e800     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       304.400     343.220
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0    pat5   delaytrans   291.690     353.880
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd0.0     delaynone          352.540     395.400
hmm-new-tensor-feed word     h200     e200     lr0.002  drop0.0  ramsprop wd0.0    pat5     tieE       610.220     599.140
hmm-new             none     h850     e850     lr0.002  drop0.0  ramsprop wd1e-05  pat5    delaynone   473.360     473.060

elman-normExp word  h850     e850     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE     delaynone   283.310     298.700
elman-norm word     h850     e850     lr0.002  drop0.0  ramsprop wd1e-05  pat5     tieE     delaynone   317.380     325.810
elman-norm word     h850     e850     lr0.002  drop0.2  ramsprop wd1e-05  pat5     tieE     delaynone   357.540     328.290
janmbuys commented 6 years ago

Newish Elman results:


elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5    38.710     161.680
elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6    68.780     188.460
elman-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10    63.850     210.350
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6   212.460     297.610
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10   226.880     311.570
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6   289.050     321.570
elman-delayed.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   314.830     340.880
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10   329.230     356.820
elman-softmax-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   360.330     367.570
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   471.440     491.570
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10   495.220     522.780
elman-delayed-mult.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6   569.100     573.180
ybisk commented 6 years ago
elman-mult word              h850     e850     lr0.002  drop0.6  ramsprop wd1e-05  pat5     tieE        74.820     100.710
elman-mult word              h850     e850     lr0.002  drop0.4  ramsprop wd1e-05  pat5     tieE        54.130     104.690
elman-decayed word           h850     e850     lr0.002  drop0.3  ramsprop wd1e-06  pat5     tieE        38.000     113.330
elman-mult word              h850     e850     lr0.002  drop0.2  ramsprop wd1e-05  pat5     tieE        40.090     113.680
elman-mult word              h850     e850     lr0.002  drop0.5  ramsprop wd1e-05  pat5     tieE        91.350     118.760
elman-decayed word           h850     e850     lr0.002  drop0.2  ramsprop wd1e-06  pat5     tieE        39.750     122.520
elman-decayed word           h850     e850     lr0.002  drop0.3  ramsprop wd1e-07  pat5     tieE        51.970     126.830
elman-decayed word           h850     e850     lr0.002  drop0.1  ramsprop wd1e-06  pat5     tieE        38.760     135.410
elman-decayed word           h850     e850     lr0.002  drop0.2  ramsprop wd1e-07  pat5     tieE        46.930     137.920
elman-decayed word           h850     e850     lr0.002  drop0.1  ramsprop wd1e-07  pat5     tieE        46.110     150.940
elman-softmax-hmm-emit word  h850     e850     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE        271.850     312.630
elman-softmax-hmm-emit word  h850     e850     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE        271.770     312.960
elman-softmax-hmm-emit word  h850     e850     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE        332.420     360.680
ybisk commented 6 years ago

my machines crashed it seems so this is what I've got at the moment. need to go to campus to restart my desktop. Will spawn lstm runs asap

elman-softmax-hmm-emit word     h250     e250     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       248.040      290.880       0.125
elman-softmax-hmm-emit word     h250     e250     lr20.0   drop0.0  sgd      wd0.0    pat5                255.290      308.170       0.320
elman-softmax-hmm-emit word     h250     e250     lr10.0   drop0.0  sgd      wd0.0    pat5                294.030      333.530       0.153
elman-softmax-hmm-emit word     h250     e250     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       295.200      333.350       0.153
elman-softmax-hmm-emit word     h250     e250     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       316.240      352.220       0.153
elman-softmax-hmm-emit word     h250     e250     lr5.0    drop0.0  sgd      wd0.0    pat5                296.880      336.210       0.153

elman-softmax-hmm-emit word     h850     e850     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       271.850      312.630       0.153
elman-softmax-hmm-emit word     h850     e850     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       271.770      312.960       0.153
elman-softmax-hmm-emit word     h850     e850     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       332.420      360.680       0.153

elman-mult word        h850     e850     lr0.002  drop0.6  ramsprop wd1e-05  pat5     tieE        74.820      100.710       0.428
elman-mult word        h850     e850     lr0.002  drop0.4  ramsprop wd1e-05  pat5     tieE        54.130      104.690       0.427
elman-mult word        h850     e850     lr0.002  drop0.2  ramsprop wd1e-05  pat5     tieE        40.090      113.680       0.428
elman-mult word        h850     e850     lr0.002  drop0.5  ramsprop wd1e-05  pat5     tieE        91.350      118.760       0.417

elman-delayed word     h850     e850     lr0.002  drop0.3  ramsprop wd1e-07  pat5     tieE       201.130      277.540       0.346
elman-delayed word     h850     e850     lr0.002  drop0.2  ramsprop wd1e-06  pat5     tieE       213.840      287.570       0.337
elman-delayed word     h850     e850     lr0.002  drop0.3  ramsprop wd1e-06  pat5     tieE       220.950      288.450       0.341
elman-delayed word     h850     e850     lr0.002  drop0.1  ramsprop wd1e-06  pat5     tieE       229.780      297.190       0.339
elman-delayed word     h850     e850     lr0.002  drop0.1  ramsprop wd1e-07  pat5     tieE       229.310      303.060       0.344
elman-delayed word     h850     e850     lr0.002  drop0.2  ramsprop wd1e-07  pat5     tieE       344.420      397.500       0.275
janmbuys commented 6 years ago

hmm-new-sigmoid.delay-emit.ramsprop.drop0.dim900.lr0.002.trshdecay10    65.530     142.310
hmm-new-sigmoid.ramsprop.drop0.dim900.lr0.002.trshdecay10   179.060     240.910

elman-softmax-single-mult-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10   377.220     425.840
elman-single.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6   485.260     495.370
elman-single.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   499.960     515.920
elman-softmax-single-mult-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   553.270     555.720

rrnn-1.ramsprop.drop0.5.dim800.lr0.002.trshdecay10.wdecay1e5    74.790      97.690
rrnn-1.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5    94.390     104.760
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e5    27.530     123.440
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10.wdecay1e6    34.490     176.220
rrnn-1.ramsprop.drop0.dim800.lr0.002.trshdecay10      44.530     189.520

elman-hmm-emit.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e6   275.410     313.840

elman-norm.ramsprop.drop0.5.dim850.lr0.002.trshdecay10   167.800     225.360
elman-norm.ramsprop.drop0.2.dim850.lr0.002.trshdecay10   169.630     229.180
elman-norm.ramsprop.drop0.dim850.lr0.002.trshdecay10   187.590     237.260
elman-norm.ramsprop.drop0.dim850.lr0.002.trshdecay10.wdecay1e5   268.720     290.620

hmm-new-sigmoid.ramsprop.drop0.dim250.lr0.002.trshdecay10   205.290     268.120
hmm-new.ramsprop.drop0.dim250.lr0.002.trshdecay10    287.740     339.170
hmm-new-sigmoid.ramsprop.drop0.dim250.lr0.002.trshdecay10.wdecay1e5   644.780     636.270
hmm-new.ramsprop.drop0.dim250.lr0.002.trshdecay10.wdecay1e5   647.280     642.660

bigram.ramsprop.drop0.4.dim900.lr0.002.trshdecay10.wdecay1e5   128.590     177.270
bigram.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e5   148.610     177.990
bigram.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5   112.080     178.600
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e5    95.540     183.490
bigram.ramsprop.drop0.6.dim900.lr0.002.trshdecay10.wdecay1e6   182.680     196.970
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10.wdecay1e4   216.920     235.660
bigram.ramsprop.drop0.dim900.lr0.002.trshdecay10      87.510     243.000
ybisk commented 6 years ago

These crashed so grain of salt but very similar performance with hidden dim of 100

elman-softmax-hmm-emit word     h250     e250     lr20.0   drop0.0  sgd      wd0.0    pat5       255.290      308.170       0.320
elman-softmax-hmm-emit word     h850     e850     lr20.0   drop0.0  sgd      wd0.0    pat5     tieE       271.850      312.630       0.153
elman-softmax-hmm-emit word     h850     e850     lr10.0   drop0.0  sgd      wd0.0    pat5     tieE       271.770      312.960       0.153
elman-softmax-hmm-emit word     h100     e100     lr20.0   drop0.0  sgd      wd0.0    pat5       275.240      320.580       0.304
elman-softmax-hmm-emit word     h100     e100     lr5.0    drop0.0  sgd      wd0.0    pat5       291.910      329.760       0.298
elman-softmax-hmm-emit word     h250     e250     lr10.0   drop0.0  sgd      wd0.0    pat5       294.030      333.530       0.153
elman-softmax-hmm-emit word     h250     e250     lr5.0    drop0.0  sgd      wd0.0    pat5       296.880      336.210       0.153
elman-softmax-hmm-emit word     h850     e850     lr5.0    drop0.0  sgd      wd0.0    pat5     tieE       332.420      360.680       0.153