mgufrone / pdf-to-html

PDF to HTML PHP Class using Poppler-Utils
MIT License
175 stars 88 forks source link

How to handle special chars? #37

Open erdely opened 7 years ago

erdely commented 7 years ago

My code to convert pdf into html file is:

\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');

$pdf = new Pdf('MY_DOCUMENT_PATH.pdf');
$page = $pdf->html();
I tried to use $pdf->html() and $pdf->getDom(), I get the same error.

Everything is working fine but now in the pdf document are some special chars and I'm getting following errors message:

DOMDocument::loadHTML(): Invalid char in CDATA 0x1 in Entity, line: ...

I tried with $pdf->html() and $pdf->getDom(), I get the same error.

With libxml_use_internal_errors(true) I get no errors but after conversion there is double content.

How is it possible to avoid this error message or to remove special chars...?