wanghaisheng / awesome-ocr

A curated list of promising OCR resources
http://wanghaisheng.github.io/ocr-arxiv-daily/
MIT License
1.66k stars 351 forks source link

ocropus3 讨论 #97

Closed wanghaisheng closed 5 years ago

wanghaisheng commented 6 years ago

code:https://github.com/NVlabs/ocropus3 doc:https://github.com/tmbdev/das2018-tutorial

wanghaisheng commented 6 years ago

vertical invariance

Explicit Normalization

imexplicit Normalization

1D LSTM

https://www.researchgate.net/publication/308825234_Binarization-free_OCR_for_historical_documents_using_LSTM_networks?enrichId=rgreq-d2b1f1ae08d78d1665779c8e9f42b1bb-XXX&enrichSource=Y292ZXJQYWdlOzMwODgyNTIzNDtBUzo0MjM4NzY3NTc4NTYyNTlAMTQ3ODA3MTUwMjkyMA%3D%3D&el=1_x_3&_esc=publicationCoverPdf

Given that 1D LSTM is not translationally invariant alongthe vertical axis, the normalization step is necessary to limitthe variations to only the horizontal axis. Through the text linenormalization step, the absolute position and scale along thevertical axis is normalized to a given height, which fits the 1DLSTM requiring all the input images to have the same height.A number of different methods have been implemented forthis normalization step in the OCRopus [17] system; we havechosen the center-normalizer method for this work because itmakes few assumptions about the underlying script and hasbeen shown in previous works to perform reliably in bothprinted and handwritten OCR of different scripts

https://www.researchgate.net/publication/260341302_High-Performance_OCR_for_Printed_English_and_Fraktur_using_LSTM_Networks

Text line normalization is an essential step in applying 1DLSTM networks to OCR, since 1D LSTM is not translationallyinvariant along the vertical axis. For Latin scripts, absoluteposition and scale along the vertical axis carries a significantamount of information and is essential for distinguishing anumber of common characters. Taken together, these obser-vations suggest that text line normalization combined with a1D LSTM network could be a good choice for Latin scriptrecognition

MDLSTM

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122597/

After choosing suitable parameters, the image of a text line is processed by dividing it into small patches using input blocks having width of 1 column and height of 4 rows. The raw pixels of the image are collapsed to a vector of length 4 and are fed to the MDLSTM with the corresponding ground truth. The small patches of the image are then scanned through forward and backward passes in all four directions (horizontally and vertically) by MDLSTM to extract and learn distinct features. The detailed schema of implementation of MDLSTM is shown in Fig. 7

default