patyork commented 7 years ago

The Problem

I wanted to jump back into SpeechRec, with Keras instead of my own NN code, so I started with the OCR code that uses the CTC (connectionist temporal classification) algorithm for label alignment as a jumpstart into how Keras now handles temporal data. Relevant PR #3436 and the code here.

The first red flag was the docstring which says:

The table below shows normalized edit distance values. Theano uses a slightly different CTC implementation, so some Theano-specific hyperparameter tuning would be needed to get it to match Tensorflow.

Norm. ED (edit distance) Epoch | TF | TH

10 | 0.072| 0.272 20 | 0.032 | 0.115 30 | 0.024 | 0.098 40 | 0.023 | 0.108

Now, all things regarding the model the same, the backend used to implement a loss function should not affect the results, especially to this degree.

I went ahead and ran the code, using the Theano backend, and saw no learning past a Normalized Edit Distance of .85 - this value for an edit distance means the output is essentially random, with the model just having learned the distribution of letters; this was backed up by the visual validation examples, where the model generally output a string of 'a's, 'e's, and 's's, which I assume are the most commonly used letters. This OOTB model ran for the default 50 epochs with the OOTB optimizer.

I swapped optimizers and used RMSProp and AdaDelta which generally see some learning where standard SGD may bounce around a local minimum. The output from both these models at 50 epochs was the same as the above, with the model just having learned to output the most-used letters, 'aes'.

Having implemented CTC myself, both in vanilla Numpy and Theano, and having used the implementations successfully in implementing the DeepSpeech paper a few years ago, I tried to reconcile my implementations with the Keras backend. The GIST is here, and should be recreatable.

The Tests

The output from the script is below. I consider the "Graves' DP Algorithm" to be a correct implementation of the CTC Algorithm; I had also created a pure recursive implementation that was extremely slow, but demonstrably correct according to the paper. I used that implementation to doublecheck the DP code, and the two implementations have always given the same output for every alphabet and input sequences (that didn't underflow).

In the output below, a few things pop out. The (considered correct) DP algorithm agrees with the "Newer Theano code", to precision, 4/10 times, is considerably close in 2 other cases, and is the closest of the implementations in the other 4 cases. Keras' current theano CTC algorithm never agrees, and is the furthest away. The old theano code falls somewhere in between.

UPDATE: With the Tensorflow addition, it becomes clear that the Keras Theano implementation is wrong. TensorFlow appears to agree more with my older Theano code rather than my newer log-scale code. It also appears that my DP algorithm may be incorrect (which is a major bummer to past me).

The Raw Results

Item  [0 1 0]
    Graves' DP Algorithm output (negative log):     13.0550199138
    Keras (theano backend) output (negative log):       10.0515659144
    Very old Theano code (negative log):            13.0483553778
    Newer Theano code, done in log-space:           13.048353865
    TensorFlow data (log scale, previously run):        13.0484
Item  [0 1 0 1]
    Graves' DP Algorithm output (negative log):     40.0587437019
    Keras (theano backend) output (negative log):       35.4150082551
    Very old Theano code (negative log):            40.0527477089
    Newer Theano code, done in log-space:           40.0527448232
    TensorFlow data (log scale, previously run):        40.0494
Item  [0 1 0 1 0 1]
    Graves' DP Algorithm output (negative log):     23.3845931494
    Keras (theano backend) output (negative log):       18.6353031982
    Very old Theano code (negative log):            21.6997022866
    Newer Theano code, done in log-space:           21.6997006158
    TensorFlow data (log scale, previously run):        21.6996
Item  [1 0 1 0]
    Graves' DP Algorithm output (negative log):     30.1091477905
    Keras (theano backend) output (negative log):       19.6508202704
    Very old Theano code (negative log):            22.6789089182
    Newer Theano code, done in log-space:           22.6789069029
    TensorFlow data (log scale, previously run):        22.6789
Item  [0 1 0 1]
    Graves' DP Algorithm output (negative log):     48.5027294892
    Keras (theano backend) output (negative log):       34.090386626
    Very old Theano code (negative log):            39.4661636869
    Newer Theano code, done in log-space:           39.4661607462
    TensorFlow data (log scale, previously run):        39.4575
Item  [1 0 1 0 1 0]
    Graves' DP Algorithm output (negative log):     27.6407711553
    Keras (theano backend) output (negative log):       21.5949752793
    Very old Theano code (negative log):            26.0706689792
    Newer Theano code, done in log-space:           26.0706677852
    TensorFlow data (log scale, previously run):        26.0663
Item  [0 1 0 1 0 1]
    Graves' DP Algorithm output (negative log):     21.1628364256
    Keras (theano backend) output (negative log):       10.8599013801
    Very old Theano code (negative log):            13.8610864284
    Newer Theano code, done in log-space:           13.8610849881
    TensorFlow data (log scale, previously run):        13.8611
Item  [0 1 0 1]
    Graves' DP Algorithm output (negative log):     43.857914566
    Keras (theano backend) output (negative log):       24.0303263053
    Very old Theano code (negative log):            30.3423977771
    Newer Theano code, done in log-space:           30.3423955638
    TensorFlow data (log scale, previously run):        30.3345
Item  [1 0 1]
    Graves' DP Algorithm output (negative log):     27.7687108345
    Keras (theano backend) output (negative log):       23.8267339413
    Very old Theano code (negative log):            27.7662118088
    Newer Theano code, done in log-space:           27.7662103826
    TensorFlow data (log scale, previously run):        27.7652
Item  [1 0 1 0]
    Graves' DP Algorithm output (negative log):     53.0654053801
    Keras (theano backend) output (negative log):       24.1569056028
    Very old Theano code (negative log):            28.5835740608
    Newer Theano code, done in log-space:           28.5835719684
    TensorFlow data (log scale, previously run):        28.5807

The Takeaways and Questions

The current Keras Theano CTC code is incorrect, as compared to the algorithm described in the original CTC papers here and here.
~~The Theano implementations are missing something, or precision is becoming an issue already with this little sample.~~
~~The older theano seems to be more accurate than the newer theano code; however the newer theano code is in log scale.~~ If we agree that the Theano backend is wrong, we may not have a good enough implementation to drop in.

My questions are:

~~Can someone run this GIST with the TensorFlow backend, and post the results?~~
Is there or has any made any other CTC implementations to compare against?
Can anyone run the OCR example with Theano and get convergence and decent results?
Am I missing something?

Edit: I bit the bullet and installed/ran the TensorFlow backend CTC. Post updated.

Edit 2: Updated "New Theano" code; better matches both the older theano and the TensorFlow data.

patyork commented 7 years ago

@fchollet @mbhenry @shawntan

fchollet commented 7 years ago

ctc_batch_cost is unit-tested in both TensorFlow and Theano. You could check out the unit tests to see if you can spot any issue.

patyork commented 7 years ago

The CTC tests are simply values that have been seen from the algorithms; specifically 2 separate sets of results for TensorFlow versus Theano. It also states that the Theano CTC scales the error value but claims that the results will be the same (or similar, given that the docstring of the example that uses it shows Theano learns slower with the same hyperparameters).

Output from the OOTB OCR example, with the Theano backend. It took 8 hours to run 50 epochs, otherwise I would have run it longer to replicate the results.

Theano backend on a GT980 GPU; the docstring states this:

            Norm. ED
Epoch |   TF   |   TH
------------------------
    10   0.072    0.272
    20   0.032    0.115
    30   0.024    0.098
    40   0.023    0.108

Full 50 epoch output is available here. 10 epoch summaries:

Epoch 10/50 
Out of 256 samples:  Mean edit distance: 2.945 Mean normalized edit distance: 0.809
12800/12800 [==============================] - 582s - loss: 3.9203 - val_loss: 6.4624

Epoch 20/50
Out of 256 samples:  Mean edit distance: 4.316 Mean normalized edit distance: 0.736
12800/12800 [==============================] - 588s - loss: 7.4194 - val_loss: 10.5375

Epoch 30/50
Out of 256 samples:  Mean edit distance: 5.316 Mean normalized edit distance: 0.715
12800/12800 [==============================] - 583s - loss: 5.9443 - val_loss: 20.9642

Epoch 40/50 
Out of 256 samples:  Mean edit distance: 6.230 Mean normalized edit distance: 0.651
12800/12800 [==============================] - 591s - loss: 5.8008 - val_loss: 25.9919

Epoch 50/50
Out of 256 samples:  Mean edit distance: 5.629 Mean normalized edit distance: 0.591
12800/12800 [==============================] - 588s - loss: 4.3675 - val_loss: 24.5465

Epoch 50 visual image: e49

So, it seems to be learning, but at approximately an order of magnitude slower than the docstring states for Theano, and even slower than that as compared to TensorFlow.

mbhenry commented 7 years ago

The difference in output between the Theano CTC and Tensorflow CTC in the original PR was in how numerical stability was handled, so the Theano output did tend to be lower in magnitude than the Tensorflow implementation. Nevertheless, I included it in the PR because I was able to get the OCR to train with it as well as TensorFlow and I didn't really have any better alternative at the time.

I noticed the training was very sensitive all along - when people started reported worse results, I was able to reproduce the results in the docstring for Tensorflow 0.9, but noticed it didn't learn nearly as well with the newer Tensorflow/Keras changes made to CTC. I'm working on trying to make the results more reproducible and stable and hope to have progress within a week or so.

patyork commented 7 years ago

I can understand the need for numerical stability, but log scale itself should be enough - especially when you incrementally increase the difficulty of the problem as the model becomes better trained (longer sequences).

I tried to get the Keras CTC and my own log scale CTC to underflow, and I was unable to, on random data up to 10,000 time steps.

Actually, I'm not positive yet, but I also think that the Theano CTC alg is using float64s.. which might explain the slowness, as it is pushing to the CPU to calculate that.

Edit: Yeah, warn_float64 shows float64, so that's another thing to think about. My code is also doing that, which explains the very high ceiling for log-scale overflow... 64 bit and log scale shouldn't underflow in this universe..

/usr/local/lib/python2.7/dist-packages/Keras-1.1.2-py2.7.egg/keras/backend/theano_backend.py:1794: UserWarning: You are creating a TensorVariable with float64 dtype. You requested an action via the Theano flag warn_float64={ignore,warn,raise,pdb}.
  smoothed_predict = (1 - alpha) * predict[:, Y] + alpha * np.float32(1.) / Y.shape[0]

Edit: After ensuring my code wasn't being upcast to float64, I was still able to run CTC up to 10,000 timesteps on random data. The error was on the order of 2e5, in log scale, making P(L) so small as to be ridiculous; essentially, log scale done correctly in 32bit should be fine.

mbhenry commented 7 years ago

Just an update here...I went back to a more trivial OCR example I had on hand that uses black and white 8 x 8 pixel letters with no distortion. This is with Tensorflow 0.11.0, Theano 0.8.0, and Cuda 8. The DNN stack is just a single GRU layer + fully connected layer. I ran it on the current Keras master branch as well as the original PR I made a few months ago:

Normalized Edit Distance

Epoch ->	20	30	40
Original CTC PR, TF	0.006	0.002	0
Keras Master, TF	0.042	0.008	0.003
Original CTC PR, TH	0.020	0.014	0.009
Keras Master, TH	0.051	0.041	0.032

So a more recent change to Keras (presumably the CTC part?) has caused the learning to slow down as well as flatten off at a higher logloss than the original PR. I'll start looking into why that is, but it at least partly explains the loss of performance on the image_ocr example in more recent Keras versions. It may be that the hyperparameters just need to be updated, so I'll try a hyperparameter search as well.

patyork commented 7 years ago

I'm also working on it. I've got an implementation that matches Tensorflow, and is ready for batching (in parallel).

Thanks for getting back to this - good to talk with someone that understands CTC.

-----Original Message----- From: "Mike Henry" notifications@github.com Sent: ‎12/‎11/‎2016 6:56 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Author" author@noreply.github.com Subject: Re: [fchollet/keras] CTC (Theano backend) is incorrect (#4634)

Just an update here...I went back to a more trivial OCR example I had on hand that uses black and white 8 x 8 letters with no distortion. The DNN stack is just a single GRU layer + fully connected layer. I ran it on the Keras master branch as well as the original PR I made a few months ago: Normalized Edit Distance Epoch -> 20 30 40 Original CTC PR, TF 0.006 0.002 0 Keras Master, TF 0.042 0.008 0.003 Original CTC PR, TH 0.0200.014 0.009 Keras Master, TH 0.051 0.041 0.032

So a more recent change to Keras (presumably the CTC part?) has caused the learning to slow down as well as flatten off at a higher logloss than the original PR. I'll start looking into why that is, but it at least partly explains the loss of performance on the image_ocr example in more recent Keras versions. It may be that the hyperparameters just need to be updated, so I'll try a hyperparameter search as well. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

mbhenry commented 7 years ago

Another quick update: I think i narrowed the issue down to the layer initialization. The default 'glorot_uniform' on the GRU seems to give a wide spread of log-loss results after repeated runs of 20 epochs, even with the settings all the same. Some other change elsewhere since I made the original PR makes the spread even wider. I switched the initialization to "he_normal" and now have more consistently good results with the trivial example. I'm going to now do some more parameter searching and see if I can get the image_ocr.py example to give consistent results.

Note this ties in to other issues people have pointed out where even with a fixed random seed for Numpy and TF, Keras gives non-deterministic results.

patyork commented 7 years ago

Interesting. The initialization functions are about the only things I had not tweaked. I'd have expected the more advanced optimizers to mitigate issues with a bad initialization space.

-----Original Message----- From: "Mike Henry" notifications@github.com Sent: ‎12/‎13/‎2016 10:51 AM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Author" author@noreply.github.com Subject: Re: [fchollet/keras] CTC (Theano backend) is incorrect (#4634)

Another quick update: I think i narrowed the issue down to the layer initialization. The default 'glorot_uniform' on the GRU seems to give a wide spread of log-loss results after 20 epochs, even with the settings all the same. Some other change elsewhere since I made the original PR makes the spread even wider. I switched the initialization to "he_normal" and now have more consistently good results with the trivial example. I'm going to now do some more parameter searching and see if I can get the image_ocr.py example to give consistent results. Note this ties in to other issues people have pointed out where even with a fixed random seed for Numpy and TF, Keras gives non-deterministic results. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

patyork commented 7 years ago

Just to finalize my issue with the current Theano CTC implementation, which is probably independent from the image_ocr.py recreation issue:

The implementation scales the CTC Loss for each minibatch, in an effort to keep from underflowing in log scale. In the Keras test case (one minibatch), the loss is lower by a factor of about 4 when taken out of log scale. For each minibatch, this factor will differ, which essentially gives weights to each training example making some examples affect the gradient more than others, based solely on its "neighbors" in the minibatch.

In addition, some smaller things: the CTC functions seem to upcast to float64, taking the computation off of the GPU, and the implementation requires a large scan op to processes each item in the minibatch sequentially instead of in parallel.

patyork commented 7 years ago

Update: my implementation makes the assumption that number of timesteps >= len(y_true) * 2 + 1 which the original CTC paper also makes, but I am unsure if Graves actually implemented it with that assumption.

This would generally not affect any real world problems, where len(y_true) is generally pretty small and the number of timesteps is generally pretty large (at least to the point of fulfilling the assumption). However, the ctc test in Keras shows that both the current implementation and Tensorflow handle these edge cases, which my implementation fails on.

I may continue working on it for Theano if I see a way to account for it, but for the time being I will just use the Tensorflow backend as the implementation there is correct.

mbhenry commented 7 years ago

Is your implementation on the GPU? If so, that would be a big speed advantage over Tensorflow's internal implementation that runs on the CPU and the borrowed Theano implementation currently in Keras. Baidu's open source warp-ctc is probably the gold standard as far as performance goes, and they may have required that assumption in order to get it to work. So I would say if you had a high-speed alternative with additional constraints, there could still be use in that.

Update on my side: I'm still narrowing down the convergence issue, and I adding masking and starting with short sequences is the key...hopefully will have an update this weekend.

patyork commented 7 years ago

Yes, it's fully on the GPU. At the moment it is slower than the current Theano implementation; I feel this is due to the fact that my code is doing a scan to get the loss for each example in the batch (the same as the current code), but the logarithmic addition operation is slow on the GPU, meaning taking this small amount of data to-from the CPU is actually faster than the log-add on the GPU. Once I implement the operation over the whole minibatch in parallel, it should be equivalent or faster.

I haven't delved into the warp-ctc implementation to check, but I did notice that in their speed tests, T >= L * 2 + 1, although that may just be coincidence.

Cool - I would say that it may be important to incorporate the entire alphabet as early as possible; e.g. incorporate spaces early on. It may also be good to try to balance the frequency of individual letters early on - incorporating the less frequently used letters (v, x, w, z, etc) on an even basis when training for the translational invariance, perhaps by just generating random strings instead of words. This won't encode any lexical info early on, if the data is random strings, but may help with simple character recognition/translational invariance.

mbhenry commented 7 years ago

Intuitively that makes sense, but in the actual RNN training process the makeup of the words doesn't seem to matter much for initial convergence. The hardest thing seems to be temporal variance, in this case, x translation. I've noticed reducing the width (and therefore the amount of temporal data being thrown at the RNN in the early training) makes a big positive difference. Whats working well for me now is starting with images that are 196 pixels wide with a single monospace font and no distortion for 3 epochs. This gets the RNN into a good state after which harder stuff can be thrown at it. Still needs some more work before a PR, but I'm very close.

Baidu actually came to the same conclusion in their Deep Speech 2 paper if you're interested in adapting this to speech. For my own speech reco (based off of this example), I scraped single word pronunciations from Forvo and used those for initial training.

mbhenry commented 7 years ago

See #4790 for fixes to this issue.

patyork commented 7 years ago

@fchollet I'm going to leave this issue open (and I'd like you too as well), because I think that the minibatch rescaling is an issue.

However, it will be a while before I can take a further look at it. Numerical stability shouldn't be as big of an issue as the implementation makes it, and batching is really important; I'm convinced there is a method that is numerically stable without scaling, and can handle batching. I think the very fast TensorFlow (and the the warp_ctc) implementations attest to these items.

The immediate issue of the example not performing has been resolved thanks to @mbhenry (and it works well in Theano); although, still, I think these examples should be run on some schedule to ensure their validity.

HariKrishna-Vydana commented 7 years ago

@patyork when i try to use ctc on my setup as mentioned in ocr example iam getting nan loss do you have some clue

input=Input(shape=(700,440)) inp=TimeDistributed(Dense(n_out, input_dim = n_hidden,activation='relu',init='glorot_uniform'))(input) l1=SimpleRNN(n_hidden, input_dim = n_in, activation='relu',return_sequences=True,consume_less='cpu',init='glorot_uniform', inner_init='orthogonal')(inp) y_pred=TimeDistributed(Dense(n_out, input_dim = n_hidden,activation='softmax',init='glorot_uniform'))(l1)

---------------------------------------------------------------------------------------

labels = Input(name='the_labels', shape=[80,], dtype='float32') input_length = Input(name='input_length', shape=[1], dtype='int64') label_length = Input(name='label_length', shape=[1], dtype='int64') loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length]) model = Model(input=[input, labels, input_length, label_length], output=[loss_out])

...............................................................................................................

model = Model(input=input, output=[loss_out])

adadelta=Adadelta(lr=0.01, rho=0.95, epsilon=1e-06,clipnorm=1.,clipvalue=0.5) model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')

patyork commented 7 years ago

RNNs usually dislike linear/ReLU activations like you have in SimpleRNN. Try tanh or sigmoid for an activation.

HariKrishna-Vydana commented 7 years ago

i tried that but still has the same error @patyork does the labels mean the sequence of integers or one hot representaations.

patyork commented 7 years ago

Try putting everything (initializations, optimizer parameters, activations) back to defaults. Modify them back one at a time until it starts going to NaN and you'll find the issue.

HariKrishna-Vydana commented 7 years ago

@patyork finally the current ctc implemented at theano back end is it correct or a faulty one. I have seen your github codes explaing the recursive algorithm and new log space ctc. they are giving different results compared to theano backend ctc.

patyork commented 7 years ago

I think all of the CTC algorithms I've got on my Github are fundamentally flawed - they seemed to work several years ago when I wrote them, but I don't think they were completely accurate.

Within Keras: the TensorFlow implementation is completely correct. However, the Theano implementation makes a concession against numeric accuracy in favor of numeric stability - so while not technically correct, it works fairly well.

Also- to answer your previous question completely: the labels is the sequence of integers like [0, 1, 2] == "abc"

HariKrishna-Vydana commented 7 years ago

@patyork but i always end up getting NAN loss using the theano_backend_ctc

patyork commented 7 years ago

That is more than likely a problem with the model or data, and not the loss function. As I said, the Theano CTC implementation should actually be more numerically stable (hit NaN less often) than the tensorflow version.

-----Original Message----- From: "harikrishnavydana" notifications@github.com Sent: ‎2/‎16/‎2017 11:58 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Mention" mention@noreply.github.com Subject: Re: [fchollet/keras] CTC (Theano backend) is incorrect (#4634)

@patyork but i always end up getting NAN loss using the theano_backend_ctc — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

delzac commented 7 years ago

@patyork Sorry, so the conclusion at this point in time is that the theano backend ctc implementation is wrong?

patyork commented 7 years ago

My conclusion was:

The implementation scales the CTC Loss for each minibatch, in an effort to keep from underflowing in log scale. In the Keras test case (one minibatch), the loss is lower by a factor of about 4 when taken out of log scale. For each minibatch, this factor will differ, which essentially gives weights to each training example (independent from the update algorithm), making some examples affect the gradient more or less than others, based solely on its "neighbors" in the minibatch. So not technically correct, and may cause issues (in addition to reporting a lower overall CTC loss for minibatches and epochs), but it works in practice.

-------- Original message --------From: delzac notifications@github.com Date: 6/22/17 7:27 AM (GMT-08:00) To: fchollet/keras keras@noreply.github.com Cc: Pat York pat.york@nevada.unr.edu, Mention mention@noreply.github.com Subject: Re: [fchollet/keras] CTC (Theano backend) is incorrect (#4634) @patyork Sorry, so the conclusion at this point in time is that the theano backend ctc implementation is wrong?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/fchollet/keras","title":"fchollet/keras","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/fchollet/keras"}},"updates":{"snippets":[{"icon":"PERSON","message":"@delzac in #4634: @patyork Sorry, so the conclusion at this point in time is that the theano backend ctc implementation is wrong?"}],"action":{"name":"View Issue","url":"https://github.com/fchollet/keras/issues/4634#issuecomment-310396598"}}}

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

nouiz commented 7 years ago

Just a note, Theano have wrapped wrap-ctc from baidu:

http://deeplearning.net/software/theano_versions/dev/library/tensor/nnet/ctc.html

On Thu, Jun 22, 2017 at 10:46 AM Pat York notifications@github.com wrote:

My conclusion was:

The implementation scales the CTC Loss for each minibatch, in an effort to keep from underflowing in log scale. In the Keras test case (one minibatch), the loss is lower by a factor of about 4 when taken out of log scale. For each minibatch, this factor will differ, which essentially gives weights to each training example (independent from the update algorithm), making some examples affect the gradient more or less than others, based solely on its "neighbors" in the minibatch. So not technically correct, and may cause issues (in addition to reporting a lower overall CTC loss for minibatches and epochs), but it works in practice.

-------- Original message --------From: delzac notifications@github.com Date: 6/22/17 7:27 AM (GMT-08:00) To: fchollet/keras < keras@noreply.github.com> Cc: Pat York pat.york@nevada.unr.edu, Mention mention@noreply.github.com Subject: Re: [fchollet/keras] CTC (Theano backend) is incorrect (#4634) @patyork Sorry, so the conclusion at this point in time is that the theano backend ctc implementation is wrong?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/fchollet/keras","title":"fchollet/keras","subtitle":"GitHub repository","main_image_url":" https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png ","avatar_image_url":" https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/fchollet/keras"}},"updates":{"snippets":[{"icon":"PERSON","message":"@delzac in #4634: @patyork Sorry, so the conclusion at this point in time is that the theano backend ctc implementation is wrong?"}],"action":{"name":"View Issue","url":" https://github.com/fchollet/keras/issues/4634#issuecomment-310396598"}}}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4634#issuecomment-310402387, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-zJ69_STLcH2E7BxUcCcAZXbKAwfks5sGn5AgaJpZM4LHP77 .

theceday commented 6 years ago

Similar issue with tensorflow backend.

https://github.com/keras-team/keras/issues/9369

keras-team / keras

CTC (Theano backend) is incorrect #4634

The Problem

Norm. ED (edit distance) Epoch | TF | TH

The Tests

The Raw Results

The Takeaways and Questions

---------------------------------------------------------------------------------------

...............................................................................................................

model = Model(input=input, output=[loss_out])