No improvement when using CTC

Sylvus commented 8 years ago

Disclaimer: This might not be a bug in the implementation but just a problem with my data/code but I want to ask/share it anyway.

I'm currently trying out the CTC implementation using some fake data (download here). The fake data is a matrix (257x200) with a bunch of ones in a specific row (each column is an observation). Currently there are only 20 classes so only the first 20 of the 257 rows contain ones. All the other rows (and most columns) are zero. The output for each matrix should be the row index that contains the ones.

I'm using a single LSTM layer with 40 hidden units. I can specify that the LSTM layer should only output the final value and compute a simple cross entropy cost without using CTC. Using this method Lasagne is able to find all the correct weights in about 600 epochs. However, when using the CTC pseudo cost the cost stays almost constant (even after 2000 epochs). Do you have any idea why this happens?

You can find the code (at least part of it) here

Sylvus commented 8 years ago

If I simplify my data set drastically (just a small matrix where, for the i-th class, the i-th row has only ones) the CTC method finally detects the correct weights but it takes much longer than just using output_final=True and cross entropy cost. Is this just a case where CTC does not work well?

shantanudev commented 8 years ago

Did you ever get the CTC to work?

skaae commented 8 years ago

no.

Sylvus commented 8 years ago

me neither. I eventually gave up on it and went with a different cost function.

shantanudev commented 8 years ago

Hey, I actually got your CTC function to work. I ran my model over 30 Epochs. The only perplexing thing is that the loss is negative. Is that supposed to happen?

EPOCH #29 Batch 0/36, loss:-153.0, ploss:-8.377 Batch 1/36, loss:-148.3, ploss:-9.217 Batch 2/36, loss:-152.3, ploss:-10.52 Batch 3/36, loss:-150.0, ploss:-8.149 Batch 4/36, loss:-149.5, ploss:-11.62 Batch 5/36, loss:-150.7, ploss:-9.386 Batch 6/36, loss:-153.0, ploss:-11.52 Batch 7/36, loss:-149.2, ploss:-8.525 Batch 8/36, loss:-149.2, ploss:-9.465 Batch 9/36, loss:-148.4, ploss:-9.401 Batch 10/36, loss:-148.8, ploss:-9.016 Batch 11/36, loss:-151.2, ploss:-8.366 Batch 12/36, loss:-149.7, ploss:-8.448 Batch 13/36, loss:-149.4, ploss:-10.54 Batch 14/36, loss:-151.8, ploss:-8.740 Batch 15/36, loss:-150.9, ploss:-9.613 Batch 16/36, loss:-152.1, ploss:-8.213 Batch 17/36, loss:-150.8, ploss:-11.17 Batch 18/36, loss:-149.3, ploss:-6.484 Batch 19/36, loss:-148.4, ploss:-10.21 Batch 20/36, loss:-147.1, ploss:-6.568 Batch 21/36, loss:-145.4, ploss:-12.06 Batch 22/36, loss:-149.0, ploss:-5.473 Batch 23/36, loss:-146.7, ploss:-12.56 Batch 24/36, loss:-149.4, ploss:-6.951 Batch 25/36, loss:-145.8, ploss:-11.24 Batch 26/36, loss:-151.7, ploss:-8.847 Batch 27/36, loss:-148.4, ploss:-9.812 Batch 28/36, loss:-151.7, ploss:-8.972 Batch 29/36, loss:-150.6, ploss:-10.22 Batch 30/36, loss:-149.5, ploss:-8.529 Batch 31/36, loss:-149.5, ploss:-8.704 Batch 32/36, loss:-150.1, ploss:-9.575 Batch 33/36, loss:-146.3, ploss:-8.276 Batch 34/36, loss:-146.8, ploss:-8.595 Batch 35/36, loss:-148.1, ploss:-9.130

network output: _h#_q_ixtcl_t_ss_ix___liygcl_g_ih_ltcl_taxpcl__p_ow__stcl_t_eydxix_tcltcl_cheh_kcl_k_h#__ actual output (true output): _h#_q_ix_tcl_t_s_ix_l_iy_gcl_g_ih_l_tcl_t_ax_pcl_p_ow_s_tcl_d_ey_dx_ix_tcl_ch_eh_kcl_kh#____

nadimgh commented 8 years ago

Hello, I am also using this version of CTC and faced the same problem of predicting only blank labels (even after a large number of epochs). @shantanudev did you face this problem before solving it? Please let me know. Thank you !

shantanudev commented 8 years ago

Hi,

No, I pretty much did my best to decipher the toy example from https://github.com/rakeshvar/rnn_ctc. I was able to find success in aligning them without any issue. Above, I showed you my raw output.

skaae / Lasagne-CTC

No improvement when using CTC #2