vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
116 stars 22 forks source link

getText or getHTML only returns first page content #25

Closed stouch closed 4 years ago

stouch commented 4 years ago

Hello,

When I do :

$client = \Vaites\ApacheTika\Client::make('localhost', 9797);
$text = $client->getHTML('../<filename>.docx');

$text only contains text of the first page of my document, is this normal ?

Thanks.

vaites commented 4 years ago

No, it must return all the content. Can you upload or send the file?. Anyway what PHP and Apache Tika versions are you using?

stouch commented 4 years ago

It's a private file, I'm gonna send it to you using Twitter , I use http://apache.crihan.fr/dist/tika/tika-server-1.23.jar and latest version of php-apache-tika

vaites commented 4 years ago

Perfect, thanks.

Only happens with this file or with any file?

vaites commented 4 years ago

After some tests it seems an Apache Tika issue, because the Tika app returns the same results.