This node.js project will synchronously and recursively loop through all JPGs in a directory and ocr specified regions of images within a directory. It will then rename the files based on the OCRed data and move them to an output directory. This project was developed in Visual Studio 2015 using NodeJS tools. It is NodeJS and the supporting modules (OCR and Image processing) all have *nix and osx versions, so in theory it should work on those platforms as well. The app looks at jpg (or jpeg) images with a resolution of 300. The higher the resolution, the more accurate tesseract will be it the OCR. If the resolution is less or more than 300, there is math to scale the location of the region. If you scan and image, you need to make sure it is scanned into a JPG at 300dpi. There is logic to not process images that it has already processed (by name and modified date and time). If you change the image and rerun the process, the application will know it has changed and reprocess the image.
GraphicsMagick - http://www.graphicsmagick.org tesseract-ocr - https://code.google.com/p/tesseract-ocr/
The following code lines define the regions that will be defined. You can add multiple 'then' clauses to add more regions. These are passed through 'data.results'
ocrutils.OCRImageSectionAsync({ imageFileName : fileName, region : { w : 450, h : 150, x : 1075, y : 440, 'name' : 'codename' } })
.then(function (data) {
return OCRImageSectionAsync({ imageFileName : fileName, region : { w : 450, h : 150, x : 1075, y : 195, 'name' : 'gamedatetime' }, results : data.results });
})
//.then(function (data) {
// return OCRImageSectionAsync({ imageFileName : fileName, region : { w : 450, h : 150, x : 1075, y : 195, 'name' : 'gamedatetime' }, results : data.results });
//})
Data will be returned through the "data" parameter
if (data.results.length > 0) TextResults[data.results[0].name] = data.results[0].ocrdtext;
if (data.results.length > 1) TextResults[data.results[1].name] = data.results[1].ocrdtext;
//if (data.results.length > 2) TextResults[data.results[2].name] = data.results[2].ocrdtext; <== uncommend me if you uncomment the .then( above.
Use the value passed through this to process funky OCRed characters and rename the file
git checkout -b my-new-feature
git commit -am 'Add some feature'
git push origin my-new-feature
V0.1 - Initial Development and Checkin
Much thanks to the following authors of the great node projects used in this utility. Thanks specifically to GraphicsMagick (http://www.graphicsmagick.org/Copyright.html) and Tesseract-OCR (https://code.google.com/p/tesseract-ocr/)
Async.js
Ricardo Vega Jr. - image-ocr-renamer Utility to ocr a directory of images based on regions within the image and copy them to a target directory with a new name. Copyright (C) 2016 Ricardo Vega Jr.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.