Open waldoj opened 10 years ago
Added challenge: nonstandard period spelling.
I recommend very highly "Text Recognition in Printed Historical Documents", by Twan van Laarhoven. In it he describes his theoretical OCR system, "The Emmius OCR System," which has components that we might do well to implement.
Via the folks at NYPL comes Tandem HMM with convolutional neural network for handwritten word recognition, published in May, which represents a real breakthrough.
This is a document from an OCR Summit Meeting (http://idhmc.tamu.edu/ocr-summit-meeting/) that includes a list of participants, http://idhmc.tamu.edu/commentpress/participants/, that might come in helpful.
The British Library has an interesting project that we should follow. More here in their blog entry: http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html.
The 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR) published a report on the state of the art that I think will be helpful. They have articles on word spotting, word segmentation, character classification, and more.