Eesen for Handwriting Recognition

wellescastro commented 6 years ago

Hi!

Is it possible apply Eesen for Handwriting Recognition decoding using run_ctc_char.sh? I have a handwriting recognition model trained with the CTC objective function that I use to generate posteriors for a given a test set. So, there are 80 classes (79 characters and space plus blank label). As an example, when I feed the model with a random image, the output is an array with shape [23,80] representing the posteriors for each of the 23 timesteps:

[[ -1.34868135e+01 -8.35496044e+00 -8.41627693e+00 ..., -1.36931696e+01 -9.87082534e-03 -4.79693222e+00] [ -9.13194656e+00 -7.16338444e+00 -6.48828840e+00 ..., -1.36707449e+01 -1.37711186e-02 -5.85543299e+00] [ -9.01710701e+00 -6.39070272e+00 -4.29255819e+00 ..., -9.18490791e+00 -7.23320580e+00 -5.94876671e+00] ..., [ -1.71867085e+01 -1.31590452e+01 -1.24031420e+01 ..., -1.56726990e+01 -3.50107886e-02 -3.39233971e+00] [ -9.83597565e+00 -1.01459503e+01 -5.68321657e+00 ..., -1.26742878e+01 -2.83434343e+00 -6.59889698e+00] [ -1.27738829e+01 -1.19765997e+01 -1.11741018e+01 ..., -1.43531590e+01 -8.91663320e-03 -7.07233000e+00]]

In this way, I intend to apply a decoding incorporating lexicon and language modeling, could someone help me? Thank you!

fmetze commented 6 years ago

Hi,

absolutely, you should be able to use Eesen for handwriting recognition. The main difference between the “char” and the “phn” scripts is how the lexicon is being generated, the "char" version introduces a blank character between words, the “phn” version does not do that. So, depending on how you want to model your data, the “char” recipe may be better for handwriting recognition, yes.

If you have a language model, you should be able to apply wFST or RNN decoding without problems.

Hope this helps? Best, F.

On Sep 11, 2017, at 12:31 AM, Dayvid Welles notifications@github.com wrote:

Hi!

Is it possible apply Eesen for Handwriting Recognition decoding using run_ctc_char.sh? I have a handwriting recognition model trained with the CTC objective function that I use to generate posteriors for a given a test set. So, there are 80 classes (79 characters and space plus blank label). As an example, when I feed the model with a random image, the output is an array with shape [23,80] representing the posteriors for each of the 23 timesteps:

[[ -1.34868135e+01 -8.35496044e+00 -8.41627693e+00 ..., -1.36931696e+01 -9.87082534e-03 -4.79693222e+00] [ -9.13194656e+00 -7.16338444e+00 -6.48828840e+00 ..., -1.36707449e+01 -1.37711186e-02 -5.85543299e+00] [ -9.01710701e+00 -6.39070272e+00 -4.29255819e+00 ..., -9.18490791e+00 -7.23320580e+00 -5.94876671e+00] ..., [ -1.71867085e+01 -1.31590452e+01 -1.24031420e+01 ..., -1.56726990e+01 -3.50107886e-02 -3.39233971e+00] [ -9.83597565e+00 -1.01459503e+01 -5.68321657e+00 ..., -1.26742878e+01 -2.83434343e+00 -6.59889698e+00] [ -1.27738829e+01 -1.19765997e+01 -1.11741018e+01 ..., -1.43531590e+01 -8.91663320e-03 -7.07233000e+00]]

In this way, I intend to apply a decoding incorporating lexicon and language modeling, could someone help me? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/146, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8UVNpLjANxp4l1fSGUcTcltKOLBJks5shLe_gaJpZM4PSn46.

wellescastro commented 6 years ago

Thank you, it was very helpful. So, I'm going to use the char-based script since I'm working with the blank label. I would like to make only more one question: after building the lexicon FST, grammar FST and token FST, in which part can I insert the matrix of log-likelihoods to decode? Like in Kaldi decode-faster-mapped tool (https://github.com/kaldi-asr/kaldi/blob/master/src/bin/decode-faster-mapped.cc). Maybe I did not get it right and the <data-dir> parameter of decode_ctc.sh must be a directory with the files containing the likelihoods per alignment, I'm a little confused.

fmetze commented 6 years ago

With the normal scripts, you provide the decoding script (which calls decode-faster) the model directory, the search graph directory, and the test data directory. It will compute the likelihoods using data and model, and then pass it to the search graph, which will compute the output. Let me know if you have any other questions! Best, F.

On Sep 15, 2017, at 2:24 AM, Dayvid Welles notifications@github.com wrote:

Thank you, it was very helpful. So, I'm going to use the char-based script since I'm working with the blank label. I would like to make only more one question: after building the lexicon FST, grammar FST and token FST, in which part can I insert the matrix of log-likelihoods to decode? Like in Kaldi decode-faster-mapped tool (https://github.com/kaldi-asr/kaldi/blob/master/src/bin/decode-faster-mapped.cc https://github.com/kaldi-asr/kaldi/blob/master/src/bin/decode-faster-mapped.cc). Maybe I did not get it right and the parameter of decode_ctc.sh must be a directory with the files containing the likelihoods per alignment, I'm a little confused.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/146#issuecomment-329693280, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8U7b-do9d_V6qnakymd3y_7eTXIsks5sihgxgaJpZM4PSn46.

ericbolo commented 6 years ago

@wellescastro , I'm curious, any luck with using EESEN for handwriting recognition?

srvk / eesen

Eesen for Handwriting Recognition #146