thiagoalessio / tesseract-ocr-for-php

A wrapper to work with Tesseract OCR inside PHP.
https://packagist.org/packages/thiagoalessio/tesseract_ocr
MIT License
2.87k stars 551 forks source link

Fix: OCR command failing when temp_dir contains any spaces #171

Closed malanx closed 5 years ago

malanx commented 5 years ago

Description

While attempting OCR on my local machine (xampp on win10) : Running tesseract directly in the terminal resulted in no issues. However when attempting to use the library the command failed and resulted in the following error:

Generated command:
"tesseract" "C:\\xampp\\htdocs\\ocr_debug\\text.jpg" C:\Users\FirstName LastName\AppData\Local\Temp\ocr1D0E.tmp

Returned message:
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
read_params_file: Can't open LastName\AppData\Local\Temp\ocr1D0E.tmp in C:\xampp\htdocs\ocr_debug\vendor\thiagoalessio\tesseract_ocr\src\FriendlyErrors.php:63
Stack trace:
#0 C:\xampp\htdocs\ocr_debug\vendor\thiagoalessio\tesseract_ocr\src\TesseractOCR.php(24): thiagoalessio\TesseractOCR\FriendlyErrors::checkCommandExecution(Object(thiagoalessio\TesseractOCR\Command), Array)
#1 C:\xampp\htdocs\ocr_debug\index.php(28): thiagoalessio\TesseractOCR\TesseractOCR->run()
#2 {main} thrown in C:\xampp\htdocs\ocr_debug\vendor\thiagoalessio\tesseract_ocr\src\FriendlyErrors.php on line 63

Notice because of the space between "FirstName" and "LastName": C:\Users\FirstName LastName\AppData\Local\Temp\ocr1D0E.tmp was passed and tesseract assumed LastName\AppData\Local\Temp\ocr1D0E.tmp as a 4th parameter

Debugging

In Command.php:getTempDir temp returns from sys_get_temp_dir() like so: C:\Users\FIRSTNAME~1\AppData\Local\Temp Which is perfectly fine and contains no spaces

In Command.php:getOutputFile temp returns from tempnam(sys_get_temp_dir(), 'ocr') like so: C:\Users\FirstName LastName\AppData\Local\Temp\ocr1CA0.tmp Not the expected: C:\Users\FIRSTNAME~1\AppData\Local\Temp\ocr1D0E.tmp

A problem because since the directory could have a space, command: "tesseract" "C:\\xampp\\htdocs\\ocr_debug\\text.jpg" C:\Users\FirstName LastName\AppData\Local\Temp\ocr1D0E.tmp is attempted

Fix

I wrapped $this->getOutputFile(false) with self::escape( ... ) The correct command: "tesseract" "C:\\xampp\\htdocs\\edgefinder\\text.jpg" "C:\\Users\\FirstName LastName\\AppData\\Local\\Temp\\ocr1D0E.tmp"

-Bonus Points: I believe I made the correct changes to the tests to accommodate the change +Apparently I have no idea how to write tests

Related Issues

I forked the repo and made the changes to the forked files. It appears that you've made changes since the last version was published so (since I havent made a PR before and don't understand the merging) I included the changes you've made since... Hoping that makes the merge easier?

thiagoalessio commented 5 years ago

@CodeJunkieio already sent you a msg on gitter, but will say again here just in case. Thanks a lot for submitting this PR :+1: :tada: :balloon: I'm planning to review #170 that also has to do with temp files, so i'll take the time to integrate your changes.

thiagoalessio commented 5 years ago

Your changes are available in version 2.8.0, thanks for taking the time to contribute!