Open mihi-tr opened 11 years ago
That's what I was suggesting here:
http://lists.okfn.org/pipermail/okfn-labs/2013-August/001046.html
The only question seems to be getting highest possible quality from OCR before broken down into chunks for transcription - but wonder if we could decouple this into two parts: (i) getting best OCR we can and (ii) breaking down text/images into chunks for use in crowdsourcing project?
I'd do the reverse (I) break down image chunks - ideally on a per article basis, (II) check transcription.
Hi,
Late to the game, just thought: OCR might be challenging on the dictionary, how about dicing up entries and queuing them for transcription checking on Pybossa - try to automatically dice articles and post the image together with the OCRed text.