mgufrone / pdf-to-html

PDF to HTML PHP Class using Poppler-Utils
MIT License
175 stars 88 forks source link

Fatal error: Uncaught exception #10

Open sarathiscookie opened 8 years ago

sarathiscookie commented 8 years ago

I have added code but getting some error.

<?php // if you are using composer, just use this include 'vendor/autoload.php'; use Gufy\PdfToHtml\Config; // change pdftohtml bin location Config::set('pdftohtml.bin', 'poppler-0.37/bin/pdftohtml.exe');

// change pdfinfo bin location Config::set('pdfinfo.bin', 'poppler-0.37/bin/pdfinfo.exe'); // initiate $pdf = new Gufy\PdfToHtml\Pdf('file.pdf');

// convert to html and return it as Dom Object $html = $pdf->html();

var_dump($html); // check if your pdf has more than one pages $total_pages = $pdf->getPages();

// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3 $html->goToPage(3);

// and then you can do as you please with that dom, you can find any element you want $paragraphs = $html->find('body > p');

?>

Error getting is

Fatal error: Uncaught exception 'Exception' with message 'You're asking to go to page 1 but max page of this document is 0' in C:\xampp\htdocs\pdf-to-html-master\vendor\gufy\pdftohtml-php\src\Html.php:53 Stack trace: #0 C:\xampp\htdocs\pdf-to-html-master\vendor\gufy\pdftohtml-php\src\Html.php(48): Gufy\PdfToHtml\Html->goToPage(1) #1 C:\xampp\htdocs\pdf-to-html-master\vendor\gufy\pdftohtml-php\src\Html.php(10): Gufy\PdfToHtml\Html->getContents('file.pdf') #2 C:\xampp\htdocs\pdf-to-html-master\vendor\gufy\pdftohtml-php\src\Pdf.php(45): Gufy\PdfToHtml\Html->__construct('file.pdf') #3 C:\xampp\htdocs\pdf-to-html-master\index.php(14): Gufy\PdfToHtml\Pdf->html() #4 {main} thrown in C:\xampp\htdocs\pdf-to-html-master\vendor\gufy\pdftohtml-php\src\Html.php on line 53

mgufrone commented 8 years ago
Config::set('pdftohtml.bin', 'poppler-0.37/bin/pdftohtml.exe');

Assuming you have installed the poppler-utils. The code above, please use the absolute path instead of relative path.

sarathiscookie commented 8 years ago

Thanks it's a path problem. I have set full path and currently no error is showing. But the page is blank.

// change pdftohtml bin location Config::set('pdftohtml.bin', 'C:/xampp/htdocs/pdf-to-html-master/poppler-0.37/bin/pdftohtml.exe');

// change pdfinfo bin location Config::set('pdfinfo.bin', 'C:/xampp/htdocs/pdf-to-html-master/poppler-0.37/bin/pdfinfo.exe'); // initiate

mgufrone commented 8 years ago

Hai, can you give me your pdf you're using? i would like to test it on my device, too.

KenFlip commented 8 years ago

Hi @mgufrone ,

I am using a mac with a local version of my site on mamp, I am using the code below to run a test on a 5 page pdf:

<?php // if you are using composer, just use this include 'vendor/autoload.php';

// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');

// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');

// initiate
$pdf = new Gufy\PdfToHtml\Pdf(BASE_PATH . 'uploads/pdf/test.pdf');

// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$html = $pdf->html();

echo $html;

?>

This is the response I am getting:

Notice: Undefined index: pages in /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Pdf.php on line 51

Fatal error: Uncaught exception 'Exception' with message 'You're asking to go to page 1 but max page of this document is 0' in /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Html.php:58 Stack trace: #0 /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Html.php(53): Gufy\PdfToHtml\Html->goToPage(1) #1 /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Html.php(12): Gufy\PdfToHtml\Html->getContents('http://localhos...') #2 /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Pdf.php(40): Gufy\PdfToHtml\Html->__construct('http://localhos...') #3 /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Pdf.php(44): Gufy\PdfToHtml\Pdf->getDom() #4 /Users/Tin/Documents/Work/flipswitch-framework/code/code.home.php(15): Gufy\PdfToHtml\Pdf->html() #5 /Users/Tin/Documents/Work/flipswitch-framework/pages/header.php(49): include_once('/Users/Tin/Docu...') #6 /Users/Tin/Documents/Work/flipswitch-framework/inc/ in /Users/Tin/Documents/Work/flipswitch-framework/vendor/gufy/pdftohtml-php/src/Html.php on line 58

Please could you help me out as this is really urgent!

Thanks in advance.

KenFlip commented 8 years ago

@sarathiscookie Please if you could help me?

mgufrone commented 8 years ago

@KenFlip have you installed poppler-utils? if you haven't, please use this command below

brew install poppler
KenFlip commented 8 years ago

Hi, I ran that command and it installed a lot of things what folder was i supposed to run the command in? Kindest Regards Kenton

KenFlip commented 8 years ago

So I ran it again just incase in my localhost root directory and in my actual site root directory and both times i got this:

Warning: poppler-0.43.0 already installed

mgufrone commented 8 years ago

It doesn't matter where you run the command in. It will install on the global directory. Okay, next is check if you have properly installed the required package.

which pdfinfo

which pdftohtml

if it is the same location as your code, the code you wrote up there will run fine

KenFlip commented 8 years ago

Ran those two:

which pdfinfo gives me: /usr/local/bin/pdfinfo which pdftohtml gives me: /usr/local/bin/pdftohtml

My code is:

// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');

// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');
mgufrone commented 8 years ago

So, how was it? still not working or else?

KenFlip commented 8 years ago

Still not working at all...

Giving me that same error

KenFlip commented 8 years ago

I can send you the pdf if you would like... my end goal out of this, is the user must be able to download a pdf form. fill it out upload it to the site and get a results page containing some of the information from the uploaded pdf...

If that makes sense.

When they fill it out, the pdf will have input fields and stuff like that, so not writing or anything

mgufrone commented 8 years ago

Okay, send me the pdf to my email. I will check it out. :3

KenFlip commented 8 years ago

so if I have a plain text pdf will that work?

mgufrone commented 8 years ago

Yap, static pdf will work.

KenFlip commented 8 years ago

Could you possibly email me a plain text pdf that works on your system, I just want to see if I can get this work on my side. I can try convert the pdf form to a plain text pdf and then run t through

mgufrone commented 8 years ago

Hang on a sec, do you run your code through browser? or just from command line?

KenFlip commented 8 years ago

through a browser

KenFlip commented 8 years ago

That pdf you sent me doesnt work...

mgufrone commented 8 years ago

Oh. PHP, by default, disable shell_exec command, that is needed by the package. You should enable shell_exec function to run the package command.

KenFlip commented 8 years ago

Have no idea how to do that or even where to do that...

garbus commented 7 years ago

Hello!

I was fighting many hours with this library (was getting the same error, that guys before).

After long time of debuging I noticed that $content in Pdf.php:22 is empty.

Finally added 2>&1 to the shell command, so I got: $content = shell_exec($this->bin()." '".$this->file."' 2>&1"); Content then was able to get errors.

What finally turned out, the system could not see my pdf, even though the path was full and correct. I was getting 'I/O Error: Couldn't open file ''test.pdf'': No error.

Removed the ' ' from the code so I got: $content = shell_exec($this->bin()." ".$this->file);

And voila! $content is now filling up with data.

But I started to getting warnings with getting html output like "(...)test-1.html: failed to open stream: No such file or directory" [src/Html.php:36]

Weird thing is, that the displayed path exists! When copy-pasting it to the explorer I can see the HTML file. Sample path is: C:_MY_FILES\PROJECTS\eclipse_workspace\htdocs\My_Project\vendor\gufy\pdftohtml-php\src/../output/57ea99fab3250/test-1.html

When opened it manually I can see the file content as "something failed". Probably two new problems. One with path somehow, second that parsed html is not generated. No power to continue this fight. :-|

(Windows 10)

jimeshgajera commented 7 years ago

@mgufrone

I have follow all step, but i get still error, which posted below

Fatal error: Uncaught exception 'Exception' with message 'You're asking to go to page 1 but max page of this document is 0' in D:\wamp\www\test\pdftohtml\vendor\gufy\pdftohtml-php\src\Html.php on line 63 Fatal error: Uncaught exception 'Exception' with message 'You're asking to go to page 1 but max page of this document is 0' in D:\wamp\www\test\pdftohtml\vendor\gufy\pdftohtml-php\src\Html.php on line 63

I am using window system and download poppler from http://blog.alivate.com.au/poppler-windows/

Directory structure as below

poppler /bin/ vendor /all required vendor

I am not getting about below step

$ which pdfinfo /usr/local/bin/pdfinfo

$ which pdftohtml /usr/local/bin/pdfinfo

Is it required to install? If yes, then how to install in window system? Because in cmd "which" command not working.

Please provide me needful solutions.

mgufrone commented 7 years ago

@jimeshgajera Have you read this part? If so, have you change the binary location on the configuration?

jimeshgajera commented 7 years ago

@mgufrone , From where to read this part?

`include 'vendor/autoload.php'; use Gufy\PdfToHtml\Config; // change pdftohtml bin location Config::set('pdftohtml.bin', 'D:/wamp/www/pdftohtml/poppler/poppler/bin/pdftohtml.exe');

// change pdfinfo bin location Config::set('pdfinfo.bin', 'D:/wamp/www/pdftohtml/poppler/bin/pdfinfo.exe'); // initiate $pdf = new Gufy\PdfToHtml\Pdf('file.pdf');

// convert to html and return it as Dom Object $html = $pdf->html();`

i have set above location for poppler

mgufrone commented 7 years ago

Okay, I will take a look into it. I will inform you what I found

drewgash commented 6 years ago

@mgufrone

I'm really struggling with the error @garbus points out:

But I started to getting warnings with getting html output like "(...)test-1.html: failed to open stream: No such file or directory" [src/Html.php:36]

Weird thing is, that the displayed path exists! When copy-pasting it to the explorer I can see the HTML file. Sample path is: C:_MY_FILES\PROJECTS\eclipse_workspace\htdocs\My_Project\vendor\gufy\pdftohtml-php\src/../output/57ea99fab3250/test-1.html

When opened it manually I can see the file content as "something failed". Probably two new problems. One with path somehow, second that parsed html is not generated. No power to continue this fight. :-|

My error.log:

[23-Apr-2018 15:58:19 UTC] PHP Warning: file_get_contents(/home/researy8/public_html/output/test(2)-1.html): failed to open stream: No such file or directory in /home/researy8/public_html/vendor/gufy/pdftohtml-php/src/Html.php on line 41 [23-Apr-2018 15:58:19 UTC] PHP Warning: DOMDocument::loadHTML(): Empty string supplied as input in /home/researy8/public_html/vendor/gufy/pdftohtml-php/src/Html.php on line 45

Unlike him my server is Linux based. Please help.

drewgash commented 6 years ago

I found the issue.

sudo yum install poppler-utils

That installs an old version (0.12.4) which does not have pdftohtml command options like "-s" and "-fmt".

Go to this page https://medium.com/@jakebathman/building-poppler-utils-for-centos-6-5-really-e52eccffc6ae to guide you on how to get a later version of poppler-utils. I installed https://poppler.freedesktop.org/poppler-0.22.5.tar.gz instead of 0.13.4 as instructed.

All the best!