thiagoalessio / tesseract-ocr-for-php

A wrapper to work with Tesseract OCR inside PHP.
https://packagist.org/packages/thiagoalessio/tesseract_ocr
MIT License
2.87k stars 551 forks source link

hocr not working (Fatal error) #233

Closed DamianoMazzara closed 2 years ago

DamianoMazzara commented 2 years ago

Expected behavior

Hocr data

Actual behavior

Fatal error: Uncaught thiagoalessio\TesseractOCR\UnsuccessfulCommandException: Error! The command did not produce any output. Generated command: "tesseract" "C:\xampp\htdocs/invoice.jpg" "C:\Users\CENTOT~1\AppData\Local\Temp\ocrDB2C.tmp" hocr Returned message: Tesseract Open Source OCR Engine v3.05.00dev with Leptonica read_params_file: Can't open hocr in C:\xampp\htdocs\vendor\thiagoalessio\tesseract_ocr\src\FriendlyErrors.php:66 Stack trace: #0 C:\xampp\htdocs\vendor\thiagoalessio\tesseract_ocr\src\TesseractOCR.php(39): thiagoalessio\TesseractOCR\FriendlyErrors::checkCommandExecution(Object(thiagoalessio\TesseractOCR\Command), '', 'Tesseract Open ...') #1 C:\xampp\htdocs\index.php(9): thiagoalessio\TesseractOCR\TesseractOCR->run() #2 {main} thrown in C:\xampp\htdocs\vendor\thiagoalessio\tesseract_ocr\src\FriendlyErrors.php on line 66

Steps to reproduce the behavior

PHP Code:


require __DIR__ . '/vendor/autoload.php';

use thiagoalessio\TesseractOCR\TesseractOCR;

echo (new TesseractOCR(__DIR__ . '/invoice.jpg'))
    ->hocr()
    ->run();

Generated code: "tesseract" "C:\xampp\htdocs/invoice.jpg" "C:\Users\CENTOT~1\AppData\Local\Temp\ocrEB08.tmp" hocr

Also, if possible, attach the image(s) you are trying to recognize. Image: https://i.gyazo.com/345d7e324a661140c591b99c1a0ed99e.jpg

Environment

DamianoMazzara commented 2 years ago

I suppose the problem is installing with chocolatey capture2text, it has some trouble. I've tried installing tesseract using the official installer for windows and now it works fine by command line, I've added the environment variable to "path" so now I'm able to use tesseract command from everywhere.

Anyway getenv('PATH') didn't show anything with "tesseract", tried to restart apache but it didn't work. So I've restarted windows and now it works.

So for anyone else having troubles:

Do not use chocolatey to install capture2text.

Install tesseract ocr by using this documentation / link to setup: https://tesseract-ocr.github.io/tessdoc/Installation.html then add environment variable to windows, restart pc and run apache. Everything will work fine.

You can follow this step-by-step tutorial I found on the web.

https://linuxhint.com/install-tesseract-windows/