Closed seragENTp closed 8 years ago
It's a bidirectional LSTM running over the rows of the image, then over the columns. The input is usually the output from a convolutional layer.
There is a bit more info and references in this paper:
Tom, I remember that you wrote some time ago that 2D LSTM is not better than 1D LSTM for OCR of printed text. Is that still true? for all scripts?
Getting good performance out of 1D LSTM requires a good normalizer. The Ocropus normalizer works surprisingly well for some non-Latin scripts, but we really need more benchmarks to see how far that carries over.
The normalizer is a fairly tricky piece of code, so it would be nice to be able to dispense with it. I'll be experimenting with once the basic GPU implementation is done.
@tmbdev what you are describing and is implemented in CLSTM is essentially a ReNet [1] style LSTM but what is in the paper [2] is the 2D case of the classical MDLSTM [3] {by graves et al.} with two forget gates
[1] https://arxiv.org/abs/1505.00393 [2] http://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Byeon_Scene_Labeling_With_2015_CVPR_paper.html [3] https://arxiv.org/abs/0705.2011
Correct, ReNet implements the same model we do, and both models are different from the original 2D LSTM. We published some additional papers in 2014, and I gave some tutorials on these kinds of multidimensional LSTMs in 2013.
hi, where is the 2D lstm ? I cannot find the implementation , did i miss someting?
what is the architecture of the 2D LSTM implemented in the library , any reference for it ?