rufuspollock-okfn / oed

Extraction and Interface for Oxford English Dictionary (OED) 1st Edition
19 stars 5 forks source link

PyBossa/Crowdcrafting #7

Open mihi-tr opened 11 years ago

mihi-tr commented 11 years ago

Hi,

Late to the game, just thought: OCR might be challenging on the dictionary, how about dicing up entries and queuing them for transcription checking on Pybossa - try to automatically dice articles and post the image together with the OCRed text.

jwyg commented 11 years ago

That's what I was suggesting here:

http://lists.okfn.org/pipermail/okfn-labs/2013-August/001046.html

The only question seems to be getting highest possible quality from OCR before broken down into chunks for transcription - but wonder if we could decouple this into two parts: (i) getting best OCR we can and (ii) breaking down text/images into chunks for use in crowdsourcing project?

mihi-tr commented 11 years ago

I'd do the reverse (I) break down image chunks - ideally on a per article basis, (II) check transcription.