issues
search
kba
/
awesome-ocr
Links to awesome OCR projects
https://github.com/kba/awesome-ocr
Creative Commons Zero v1.0 Universal
2.83k
stars
349
forks
source link
Open Source OCR for Large Collections of Scanned Documents - Art Rhyno
#34
Open
kba
opened
8 years ago
kba
commented
8 years ago
zuphilip
commented
8 years ago
Some points from @artunit's talk:
concentrates on OCR of newspapers on microfilms/microfiches
concentrates on ABBYY vs. Tesseract
comparison slide:
https://youtu.be/gcjCiS9pJ3A?t=1439
mentions the Line Segment Detector:
http://www.ipol.im/pub/art/2012/gjmr-lsd/
mentions the Olena project:
https://www.lrde.epita.fr/wiki/Olena
distributed OCR with Hadoop
mentions his repo
https://github.com/artunit/ossocr
discussion in the end
backup, storage on hard drives
OCRropus (based on Tesseract at this time): Python based, effective for book pages not newspapers
What is Google's role in Tesseract?
How to present that all in the end? Annotations for the users?