vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
114 stars 22 forks source link

Allow to set Tika server as URL #13

Closed mpdude closed 5 years ago

mpdude commented 5 years ago

This makes it much easier to pass in the target host and port from an environment variable.

<?php

require_once('vendor/autoload.php');

$client = \Vaites\ApacheTika\Client::make();
$client->setBaseUrl(getenv('TIKA_SERVER_URL'));
...
vaites commented 5 years ago

Thanks @mpdude, I think is a good thing to set host and port using an URL but there's no need to change the current behaviour: when instantiating the class we need to check if server is running and throw an exception if can't connect. So allowing a constructor without parameters is not acceptable.

Anyway, I added the support for this feature in latest commit (https://github.com/vaites/php-apache-tika/commit/6d03a1613923efd6a451cd24252453825a9465a4) so you can use your env variable:

$client = \Vaites\ApacheTika\Client::make(getenv('TIKA_SERVER_URL'));

Will release a new version soon with this feature, just after fix the pending #11 issue.

Thanks again!

mpdude commented 5 years ago

Hey @vaites,

great to hear you like the idea.

Regarding the connection check when the class is instantiated – do you really think this is necessary? When defining the client in a DI container like with Symfony, you might incur a remote call (HTTP request) at times where you really don't need it, just because the instance is created.

Don't you think an exception when performing the actual request would suffice?

vaites commented 5 years ago

The reason of this check is because the cost of this call is almost none and in the execution of the script more calls will be made (if not, why instantiate the class?). This is not mandatory (an option to disable check can be added) but I think is a good practice to test if all works well before making the first call.

vaites commented 5 years ago

I've been thinking about this and you're right: if you want to use the class in DI or in a queue of jobs that are serialized, there's no need to make any checks until the class is used with the request() method. So I added the Client::prepare() method that delays this check, allowing to prepare the class to use it later, and keep the Client::make() with the old behaviour.

This will be available in the upcoming 0.6 release and hope it solves your issue...

mpdude commented 5 years ago

So, if I skimmed the code correctly, that's

$client = \Vaites\ApacheTika\Client::make(getenv('TIKA_SERVER_URL'));

... to connect with an URL and no additional code change required to get the delayed connection check?

vaites commented 5 years ago

No, I kept the connection check with make because is the current behaviour and what expects the actual users. Use prepare to delay the connection check until the first request call:

$client = \Vaites\ApacheTika\Client::prepare(getenv('TIKA_SERVER_URL'));
$client->getVersion(); // here we check