keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.12k stars 19.49k forks source link

implement CTC with keras? #383

Closed blackyang closed 7 years ago

blackyang commented 9 years ago

Hi there,

Has anyone implemented a (Connectionist-Temporal-Classification)CTC loss with keras?

I attempt to add such a cost function in objectives.py file, based on rakeshvar's code. The model could be compiled, however, there are several errors when I do model.fit(). I am new to theano so it's really tough for me to debug...

It shouldn't be hard in theory, so I guess I made some "naive" mistakes...

fchollet commented 9 years ago

Do you have a reference for what you are trying to implement? As well as your attempt so far.

blackyang commented 9 years ago

Hi @fchollet , the original paper of CTC could be found here by Alex Graves.

Basically, CTC is a special loss function to handle alignment. For example, in speech recognition, suppose the input sequence has a length of t (then the output of RNN also has a length of t), usually the target sequence would have a length of w smaller than t. CTC saves the need for pre-segmentation of the inputs and post-segmentation of the net outputs.

I was trying to add a new cost function in objectives.py, based on this ctc.py file. The model could be compiled, however, there are several errors when I do model.fit(). I guess the reason lies in these lines, which implies that the two arguments to cost function should share same shape. Correct me if I misunderstand anything

futurely commented 9 years ago

@amaas implemented the CTC loss strictly faithful to the original paper in a very straightforward way.

blackyang commented 9 years ago

@futurely thanks! Currently I am using this with lasagne :-)

amaas commented 9 years ago

It should be relatively straightforward to port our CTC implementation into the Keras framework. Note that our fast version is cython (which doesn't seem to be used elsewhere in Keras). Without cython the loops to compute alignments required to evaluate the CTC loss were painfully slow.

ghost commented 9 years ago

@amaas : do you have a theano version implementation? Or can your fast version work with theano?

amaas commented 9 years ago

@jedi00 No, we wrote our RNNs from scratch without Theano. If you want to replace the NN architecture though you could take just our CTC loss and make it a Theano function. It only needs to interact with the final layer so it should be mostly unchanged in a Theano implementation.

jinserk commented 9 years ago

Hi @blackyang, did you implement Lasagne's CTC into Keras? If you did, could you tell me how to do? Keras' loss objects are all functions defined in objectives.py, and this seems being called from compile() function in models.py. It is wrapped with weighted_objective() function, which call the loss function object with only two params y_true and y_pred. However, Lasagne's CTC is a class object, and the apply() function seems to require 4 params. I'm stuck here. Thank you.

blackyang commented 9 years ago

Hi @jinserk , I was stuck at the same place, so I used Lasagne which I think is more extensible. By the way I recommend amaas's implementation instead of Lasagne's CTC, since the later one is somehow problematic

Michlong commented 8 years ago

I tried it too, unfortunately, failed...

futurely commented 8 years ago

The following paper trained a convolutional bidirectional LSTM network to recognize natural scene texts without text line segmentation. The open source code implemented CTC in C++ for the Torch7 framework in Lua. The C++ code can be modified to use in Python.

[1] B. Shi, X. Bai, C. Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR abs/1507.05717, 2015.

ekelsen commented 8 years ago

Baidu just released their open source CPU and GPU implementation of CTC here: https://github.com/baidu-research/warp-ctc

It is released as a C-library and bindings for Torch. The C library should be easy to integrate into many different projects.

blackyang commented 8 years ago

@ekelsen thanks for the pointer!

ZhangAustin commented 8 years ago

Here is a implementation of Theano bindings for Baidu's warp-ctc: https://github.com/sherjilozair/ctc

Is there any plan for Keras to do bind this?

futurely commented 8 years ago

https://github.com/baidu-research/warp-ctc

mschonwe commented 8 years ago

And from TensorFlow... https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ctc

shantanudev commented 8 years ago

You guys have any luck with this implementation?

ghost commented 8 years ago

I maintain a repository of CTC with various implementations, including cython, numba/python and theano versions, check here: https://github.com/daweileng/Precise-CTC. You can use CTC_precise or CTC_for_train class, they're both fine for RNN training.

The CTC objective is different from the current objective functions in Keras, and requires different masking mechanism. I also maintain a repository of Keras MOD with CTC incorporated, check here : https://github.com/daweileng/keras_MOD. Currently, only train_on_batch() is modified to be compatible with CTC. This is enough for me, so there's no definite planning to modify other parts of Keras.

shantanudev commented 8 years ago

Oh this is perfect and exactly what I am interested in. Thank you!

nouiz commented 8 years ago

Just to let you know, there is this discussion with version that wrap baidu version that could be faster:

https://github.com/Theano/Theano/issues/3871#issuecomment-207536539

There is 2 current wrapper version at:

https://github.com/mcf06/theano_ctc

and

https://github.com/sherjilozair/ctc

On Fri, May 13, 2016 at 1:41 AM, shantanudev notifications@github.com wrote:

Oh this is perfect and exactly what I am interested in. Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/383#issuecomment-218956783

lingz commented 8 years ago

@daweileng Do you have any instructions/examples as to how to use your Keras MOD?

ghost commented 8 years ago

Under the repository https://github.com/daweileng/Precise-CTC, there is a folder named as 'Test', you can find a demo script 'mnist_ctc_v4.py' there.

vkatsouros commented 8 years ago

@daweileng In mnist_ctc_v4.py you import from NN_auxiliary and from mytheano_utils. Can you share these too? Maybe in Keras MOD?

ghost commented 8 years ago

For who's interested: I updated my CTC-integrated Keras fork to base version 1.0.4, check here: https://github.com/daweileng/keras_MOD/tree/MOD_1.0.4. Till now, the following train/test functions work well with CTC cost:

githubnemo commented 8 years ago

@daweileng Sadly you did not fork the Keras repository. Instead you just copied the files over and added everything (including your patches) in one commit. Can you do that properly (e.g., press the fork button on github, clone, add your changes, commit separately, push) so your patches become actually visible? That'd be awesome.

ghost commented 8 years ago

@githubnemo As explained in the README, the reason I didn't make a pull request is that to avoid a mass modification of Keras' masking mechanism, currently I override sample_weights and masks variables of Keras. In theory this should not cause problem for other networks but I'm not 100% sure about this. Besides, the modification of fit() function is not done yet. I'd like to collect enough feedback before an official pull request to Keras master branch.

If you just want to know what are changed, you can compare contents of the two repositories.

Progress: Now FCN can work with LSTM + CTC!

pasky commented 8 years ago

See also #3436

patyork commented 7 years ago

@harikrishnavydana The ocr example runs fine for me on both Theano and Tensorflow.

If it is not working for you, please review the issue guidelines (update keras) and if the issue persists, open a new issue.

HariKrishna-Vydana commented 7 years ago

Thank you, i was using the older version of keras @patyork

besanson commented 7 years ago

Hi, thanks @patyork . Just to understand. You are putting text in images. And using some of these to train and others to validate? but you are using full words to train and not characters. Pycairo is a complicated library to install :)

anuj-rathore commented 7 years ago

I am trying to use keras ctc in Bidirectional LSTM i.e. https://github.com/lvapeab/ABiViRNet Network is as: https://pastebin.com/9QXbJSwE

Since loss function in keras uses 2 arguements, ctc_batch_cost uses 4. Can somebody tell me how to process it?

selcouthlyBlue commented 6 years ago

Apparently, there is a ctc_loss implementation in Keras. There's an open issue on Keras' ctc_batch_cost in the tensorflow_backend.

hypernote commented 6 years ago

Hello... We already have some sample of CTC at keras repository?

selcouthlyBlue commented 6 years ago

You mean this one? If so, yeah I know there is already a sample. It's just when I search for "Keras CTC" in google, this issue comes up and I just thought it would be nice to let people know that such an implementation already exists.

hypernote commented 6 years ago

Great

rasto2211 commented 6 years ago

Is it ok to use ctc_batch_cost as keras loss function and pass it to model.compile? All the losses that are implemented in: https://github.com/keras-team/keras/blob/master/keras/losses.py take only one sample. Is it efficient?

Is there any plan to integrate WarpCTC to Keras?

saisumanth007 commented 6 years ago

Could you please tell what input_length and label_length specify? As per the documentation it seems label_length contains the lengths of ground truth strings (in case of OCR). But I'm not sure what input_length means.

aayushee commented 6 years ago

I think input_length refers to your sequence length and label_length refers to the ground truth label length.