vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
114 stars 22 forks source link

Add option to set `fetcherName` for Tika >= 2.0.0 #33

Closed relthyg closed 1 year ago

relthyg commented 1 year ago

In Tika >= 2.0.0, fetching remote files via the server is done using so called fetchers. If you are running a Tika Server that is configured to use an HTTP fetcher, you need the client to tell the server which fetcher to use, which is done by adding the HTTP header fetcherName to the request. Furthermore, the URL of the remote file to be fetched must be passed using a fetchKey header instead of fetchUrl as in Tika 1.x.x.

This adds a public API method to set the fetcher name, and replaces the fileUrl header with fetcherName and fetchKey if a fetcher name is set. If no fetcher name is set, the fileUrl header is still added to the request as usual to keep TIKA 1.x.x compatibility.

vaites commented 1 year ago

Thanks @relthyg, please give me a few days to take a look to these Tika feature and your changes. It looks OK but I'm working on the 2.0 version of this library and want to see how to integrate on it too...

relthyg commented 1 year ago

Thanks fore reaching out, @vaites. Let me know If I can do anything to help or improve the PR.

mpdude commented 1 year ago

Hey David, if there's anything that would help you – let me know.

vaites commented 1 year ago

Sorry for the delay @relthyg and @mpdude, I want to understand well the functionality to add some tests (and merge it on the upcoming 2.x version) and I'm taking longer than expected. This week I hope to have it ready...

vaites commented 1 year ago

PR is merged and version v1.3.0 is published. Thanks for your contribution 👍

relthyg commented 1 year ago

Thank you for accepting the PR!