vaites / php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
MIT License
114 stars 22 forks source link

When callback present, do not keep response #24

Closed oytuntez closed 4 years ago

oytuntez commented 4 years ago

Hi there,

Amazing library, thank you! Callbacks are usually used to stream response in chunks, to work around larger files. However, both CLI and web clients are still appending to the response, this may increase memory usage:

Example in current implementation:

// callback
        if(!is_null($this->callback))
        {
            $callback = $this->callback;

            $options[CURLOPT_WRITEFUNCTION] = function($handler, $data) use($callback)
            {
                $this->response .= $data;

                $callback($data);

                // safe because cURL must receive the number of *bytes* written
                return strlen($data);
            };
        }

I am doing some tests and may need to fork to apply PR.

vaites commented 4 years ago

Thanks @oytuntez, I confirmed the issue. I think the best solution is to add a new boolean parameter to Client::setCallback() allowing to disable appending. This way we don't make changes that break the current behaviour.

Do yo want to make a PR?. I can add this fix soon if you want...

vaites commented 4 years ago

Sorry for the delay. I added the requested feature. Will publish a new version soon but if you want you can test it before the release.

The behaviour is simple: the append parameter disables the response append so the value returned wil be empty and the callback must process it chunk by chunk.

vaites commented 4 years ago

Fixed in release v0.8.0