steve-ferrero / tesseract-php

PHP class for Tesseract command line
http://www.web-atrio.com/SSII/developpement-php-mysql
1 stars 0 forks source link

Magick not found #1

Open dev-code-davis opened 6 years ago

dev-code-davis commented 6 years ago

Hi,

First of all - thanks for creating this tool. However, I'm having a problem where the magick library is not found. I have installed it on Ubuntu 16.04. However, when I try to run the package, I git the error:

string(50) "magick test.pdf /tmp/59f08c3e7daf5.png 2>&1" array(1) { [0]=> string(24) "sh: 1: magick: not found" } Warning: unlink(/tmp/59f08c3e7daf5.png): No such file or directory in /var/www/drupalvm/drupal/ocr_test1/vendor/web-atrio/tesseract-php/TesseractPHP.php on line 113 array(2) { [0]=> string(53) "ERROR: Can not open input file /tmp/59f08c3e7daf5.png" [1]=> string(24) "Error during processing." }

It seems that the magick location is set on /web-atrio/php-pdf-to-image/PDFToImage.php: function setMagickPath($magickPath) { $this->magickPath = $magickPath; }

So the question is, which of the Magick files (there are many) to call since 'magick' doesn't seem to work?

My complete code:

https://gist.github.com/Gugols/82183d51ee68a0fd36c46d7d1ef369ae

dev-code-davis commented 6 years ago

Update: I managed to pass the location which in this case is "/usr/bin/convert". However, there's a change needed in steve-ferrero/php-pdf-to-image repo's PDFToImage.php. The hardcoded magick string is used instead of property magickPath. My quick fix (Lines 35-36): https://gist.github.com/Gugols/a9baa7e97c225c03725f2dea8e27b54f

However, now I'm not sure of the best way how to pass the imagemagick argument from the TesseractPHP object? Ideas?

dev-code-davis commented 6 years ago

Hmm, does this really work for multi-page pdf's? As I a pass it it only tries to unlink just a one file:

string(60) "/usr/bin/convert test.pdf temp/59f0ccb5d2ea3.png 2>&1" array(0) { } Warning: unlink(temp/59f0ccb5d2ea3.png): No such file or directory in /var/www/drupalvm/drupal/ocr_test1/vendor/web-atrio/tesseract-php/TesseractPHP.php on line 113 array(2) { [0]=> string(53) "ERROR: Can not open input file temp/59f0ccb5d2ea3.png" [1]=> string(24) "Error during processing." }

dev-code-davis commented 6 years ago

Also, by default Imagick converts the pdf into relatively small png and thus Tesseract is unable to read it. I tried changing to: exec($this->magickPath . " -density 300 " . $this->pdfFile . " " . $outputFile . " 2>&1", $output); It's kind of slow converting process, maybe lower density value works as well. Haven't tested it yet. It probably should be be defined as property.