PHP Fatal error: Uncaught Exception: Unprocessable document in C:\WORKAREA\Projects\cloudconversa\research\tika\vendor\vaites\php-apache-tika\src\Clients\WebClient.php:642
Stack trace:
#0 C:\WORKAREA\Projects\cloudconversa\research\tika\vendor\vaites\php-apache-tika\src\Clients\WebClient.php(556): Vaites\ApacheTika\Clients\WebClient->error()
#1 C:\WORKAREA\Projects\cloudconversa\research\tika\vendor\vaites\php-apache-tika\src\Client.php(389): Vaites\ApacheTika\Clients\WebClient->request()
#2 C:\WORKAREA\Projects\cloudconversa\research\tika\tika.php(12): Vaites\ApacheTika\Client->getMainText()
#3 {main}
thrown in C:\WORKAREA\Projects\cloudconversa\research\tika\vendor\vaites\php-apache-tika\src\Clients\WebClient.php on line 642
this because on Client.php row 540 checks for "invalid remote file" before try to "download remote file if required only for integrated downloader".
I switched these two blocks.
Than I also added on row 637 the CURLOPT_FOLLOWLOCATION option to follow redirects and avoid errors when download URL has a 301.
Hi, I found a potential bug when try to use Tika to parse URL's. My code is the following, using TIKA v.2.9.x via docker:
the error is the follow:
this because on Client.php row 540 checks for "invalid remote file" before try to "download remote file if required only for integrated downloader". I switched these two blocks. Than I also added on row 637 the CURLOPT_FOLLOWLOCATION option to follow redirects and avoid errors when download URL has a 301.
Hope this can be useful. thank you!