tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.55k stars 9.44k forks source link

Compile to EMScripten/asm.js #75

Closed brettz9 closed 9 years ago

brettz9 commented 9 years ago

For those of us who know nothing of C, might someone be kind enough to use EMScripten/asm.js to compile to JavaScript on our behalf for use in the browser (without Node.js, etc.)? Would no doubt be quite slow but would be handy for some web apps... The other existing ports (Ocrad and GOCR) do not seem to hold a candle to the quality of Tesseract. Thanks!

brettz9 commented 9 years ago

A particularly compelling use case beyond regular web apps would be as a browser add-on for running OCR against images encountered on the web and placing the OCR'd results in a dialog, in-place on the web page, etc..

In my Firefox add-on for decoding QR codes, I already have infrastructure in place which could largely be reused to allow OCR against images found while browsing the web, whether as a regular image, as a frame in a video, as a background image, SVG element, canvas, or, probably PDF too, given that ImageMagick has already been ported.

jimregan commented 9 years ago

In short: there's too much involved, so don't hold your breath.

brettz9 commented 9 years ago

Ok, thanks for the reply!

amitdo commented 9 years ago

@brettz9, It seems that someone is working on this: https://github.com/naptha

brettz9 commented 9 years ago

Will look into it...Thanks!