vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
116 stars 22 forks source link

No Url set! Exception for remote files. #8

Closed BashirRezaiee closed 6 years ago

BashirRezaiee commented 6 years ago

I'm facing the NO Url set! exception with the following file and config: $remoteFile = 'http://www.africau.edu/images/default/sample.pdf'; $client = \Vaites\ApacheTika\Client::make('localhost', 9998); $text = $client->getText($remoteFile);

I tested to run server with both:

  1. java --add-modules java.se.ee -jar tika-server-1.18.jar
  2. java --add-modules java.se.ee -jar tika-server-1.18.jar -enableUnsecureFeatures -enableFileUrl commands.

java version "10.0.1" PHP 7.2.1 Apache 2.4.23

vaites commented 6 years ago

I tried to run your code against Java 8 and Java 10 with Apache Tika Server 1.18 and works well:

A Simple PDF File 
 This is a small demonstration .pdf file - 

 just for use in the Virtual Mechanics tutorials. More text. And more 
 text. And more text. And more text. And more text. 

 And more text. And more text. And more text. And more text. And more 
 text. And more text. Boring, zzzzz. And more text. And more text. And 
 more text. And more text. And more text. And more text. And more text. 
 And more text. And more text. 

 And more text. And more text. And more text. And more text. And more 
 text. And more text. And more text. Even more. Continued on page 2 ...

 Simple PDF File 2 
 ...continued from page 1. Yet more text. And more text. And more text. 
 And more text. And more text. And more text. And more text. And more 
 text. Oh, how boring typing this stuff. But not as boring as watching 
 paint dry. And more text. And more text. And more text. And more text.

I tried both JRE and JDK without problems...

Are you using Windows, Mac or Linux?

BashirRezaiee commented 6 years ago

I'm using windows. I used it in a Laravel project in local environment and for local files upload it works, but for remote files shows the error. I dont know where is the problem.

vaites commented 6 years ago

It seems a cURL error. How do you installed PHP? With XAMPP or similar?

BashirRezaiee commented 6 years ago

I'm using wampserver 3.1.1 with Apache 2.4.23 - PHP 7.2.1

vaites commented 6 years ago

OK, will try to reproduce this bug with your environment.

vaites commented 6 years ago

I apologize for the delay, I still work on it...

vaites commented 6 years ago

Confirmed bug on Windows with any PHP version...

vaites commented 6 years ago

I think I found a solution. Can you please try with the latest commit?. Just change the version of the dependency on your composer.json to dev:master and update. Please, tell me if it works to rool a new version.

vaites commented 6 years ago

Hi @BashirRezaiee, have you been able to check if it works?

BashirRezaiee commented 6 years ago

Hi @vaites , I apologize for the long time delay. In Afghanistan we have problems with internet accessibility and other related areas. I have tried out the dev-master, and got the Unprocessable document exception with these two files: 1- http://www.africau.edu/images/default/sample.pdf 2- http://www.iiswc.org/iiswc2009/sample.doc

My code is as follow: $remoteFile = 'http://www.africau.edu/images/default/sample.pdf'; $remoteFile2 = 'http://www.iiswc.org/iiswc2009/sample.doc'; $client = \Vaites\ApacheTika\Client::make('localhost', 9998); $text = $client->getText($remoteFile); dd($text);

vaites commented 6 years ago

Thanks @BashirRezaiee, will try again. I'm setting up a test environment on Windows to run against on each release, and working to solve this issue.

vaites commented 6 years ago

Finally I fixed it: I added an internal downloaded (configurable) to download the remote file to filesystem before passing it to Apache Tika. With this feature there's no need to use enableUnsecureFeatures. The Unprocessable document was caused by the server when unsecure features are not enabled.

Just upgrade to the 0.4.5 version to get it fixed.