igormq / ctc_tensorflow_example

CTC + Tensorflow Example for ASR
MIT License
312 stars 183 forks source link

lstm + ctc for mnist #2

Closed anxingle closed 8 years ago

anxingle commented 8 years ago

Hi, igormq. It is very helpful to see your Blog talk about CTC on Tensorflow . Thank you a million. But I have some confusion about the CTC module.

  1. If sequence is A B B * B * B( * is blank). tf.ctc.ctc_greedy_decoder() should return ABBB. But Doc. say result is A B if merge_repeated =True. 2 . My code is using LSTM to classify Mnist data . Just one layer and 28 timeSteps . But CTC_LOSS don't work at all. Can you help me define the right call style? The code is so simple and I promise U can get it when you see the code . Thanks again.
anxingle commented 8 years ago

I write just as your code tell me.And it works well if comment the CTC functions. I really don't know what's wrong with it .

igormq commented 8 years ago

Thank you @anxingle , I'm very glad that you liked my post. Answering your questions:

  1. Yes, you are absolute right, this is the default behavior of TensorFlow's implementation, but in Graves' thesis, he wrote that you have to delete the repeated labels and therefore remove the blank labels, as we can see at page 57 of his thesis. I don't have any clue why the Tensorflow team implemented in that way.
  2. I read your code, but it's better if you send to me your error log and your code with the CTC implementation (not as a comment), because in your code I didn't see the seq_len placeholder and the sparse placeholder for y. Could you do that?
anxingle commented 8 years ago

Thank you very much. I will do what you told me as soon as I can.

anxingle commented 8 years ago

I add the entire code, and show me error.txt.

anxingle commented 8 years ago

I tried tf.int64.

igormq commented 8 years ago

Could you send me your dataset?

anxingle commented 8 years ago

I have push the mnist dataset into the data , you can just git clone the repository.
I am really grateful to you.

anxingle commented 8 years ago

It takes almost 1 hours.Thanks GFW

igormq commented 8 years ago

Why are you trying to use CTC as a cost function? CTC is used when you don't have an alignment between your input and output and/or the output length vary along the samples. So, for one to one relationship (like one image one digit), CTC probably isn't the best solution for you. But, if you intend to use this code in a continuous hand writing recognition, CTC will work better. I'm looking your code and making some changes. As soon as possible I'll give you a feedback, ok?

anxingle commented 8 years ago

Thank you for your reply. But in this code, I have 28 inputs, so it's a problem about many inputs( maybe laterly I'll add multi labels) maps to one label. My senior implement multi labels recognise framework mxnet warpctc and he told me it should be the best solution .
So nice !

igormq commented 8 years ago

Yes, but CTC works only for more than one label. I'll show you a working code, but I don't think that for this example CTC will outperform the softmax layer.

anxingle commented 8 years ago

Got it! I change another dataset !

igormq commented 8 years ago

I made a working code and I put it on gist. You major issue was using the sparse place holder and the sequence length place holder. The targets required by CTC must not be encoded, you must provide as labels and you must feed the sparse place holder as a tuple of (indices, values, shape) (that is generated by sparse_tuple_from); in the case for mnist, for batch you will have a target like

y = (
[[0, 0], [1, 0], [2, 0], ..., [batch_size-1, 0]],
[label_1, label_2, label_3, ..., label_batch_size],
[batch_size, 1]
)

And the seq_lenplaceholder works to tell the run what is the size of each data in batch, but for MNIST, the network was feed with 28 inputs of length 28, so:

seq_len = [28 for _ in xrange(batch_size)]

I hope I could help you. If you have any question I'll be happy to answer you.

igormq commented 8 years ago

You can use this dataset, whose images have more than one digit and the number of digits differ from image to image. CTC may work better with this dataset.

anxingle commented 8 years ago

I even don't know how to express my appreciation ! Thanks a lot.

igormq commented 8 years ago

You're welcome. If you have any questions, please feel free to ask.