zendframework / zend-http

Http component from Zend Framework
BSD 3-Clause "New" or "Revised" License
134 stars 85 forks source link

Client, Socket adapter: compressed responses not decompressed when streaming #91

Closed zerocrates closed 4 years ago

zerocrates commented 8 years ago

The default Client pathways that use the Socket adapter automatically handle gzipped and deflated response bodies, and therefore include them in the Accept-Encoding header outgoing on responses as long as the zlib extension is present.

This automatic handling doesn't extend to the streaming support, however. In addition, the Client doesn't account for this fact and will still send an Accept-Encoding header indicating deflate and zlib are acceptable. The result is that the server will obey the Accept-Encoding header and send a compressed response, none of the Client, the Adapter, or the Response will decompress it, and it will be streamed as-is, in compressed form.

Because of the workflow involved in using the client, calling code can't know to add a decompression stream filter, as it would have to inspect headers to determine if the response was in fact compressed, but by the time those headers are available, send() has already been called and the data has already been streamed to its destination. The best available workaround is to manually set an Accept-Encoding header to tell servers that compression is not supported when using streams.

The Client should either not send the Accept-Encoding header when setStream has been called, or the Client or Adapter should automatically attach an appropriate stream filter to decompress the response according to the response headers.

This appears to be the same issue as the longstanding ZF1-era issue ZF-10878. I'm not sure how or if it's related to #89 which is Curl-specific and doesn't require streaming.

michalbundyra commented 5 years ago

@zerocrates I've tried to replicate it and what I can see is:

What is the issue with this approach?

        $uri = 'http://server_with_gzip/file.txt';

        $config = [
            'adapter' => Client\Adapter\Curl::class,
            'storeresponse' => false,
        ];
        $client = new Client($uri, $config);
        $client->setStream('response.txt');
        $client->send();
        $response = $client->getResponse();

        $content = $response->getContent();
        $body = $response->getBody();
zerocrates commented 5 years ago

It's been quite a while since I filed this, so I'm a little hazy on the details, but a couple things, if I'm reading your response correctly:

First, this issue is specific to the Socket adapter.

Second, though you're right that you can read from $response->getBody() and get the decoded result, this has the undesirable effect of loading the response as a string, when the intent was, for example, to save directly to a file. Put more simply, getBody uses readStream which runs stream_get_contents: if I'm going to be loading the whole response as a string, then why bother streaming in the first place?

I'm fairly certain this is still an issue.

$client = new \Zend\Http\Client('test/url/with/compression');
$client->setStream('test-output.txt');
$client->send(); // still-compressed output is now saved at test-output.txt
michalbundyra commented 5 years ago

@zerocrates I've tested it again:

$client = new \Zend\Http\Client('test/url/with/compression', [
    'adapter' => \Zend\Http\Client\Adapter\Curl::class,
]);
$client->setStream('test-output-curl.txt');
$client->send();

and

$client = new \Zend\Http\Client('test/url/with/compression', [
    'adapter' => \Zend\Http\Client\Adapter\Socket::class,
]);
$client->setStream('test-output-socket.txt');
$client->send();

in both cases I am getting decoded content in txt files, as I would expect.

I checked also without stream and then $client->getResponse()->getContent() is gzipped, and $client->getResponse()->getBody() is decoded content.

I can't see any issue. Would you be able to test it again and confirm, or provide failing test case? Thanks!

zerocrates commented 5 years ago

Okay so here's my exact test, using zend-http 2.10.0 installed with Composer:

<?php

require 'vendor/autoload.php';

$client = new \Zend\Http\Client('https://www.gnu.org/licenses/gpl-3.0.txt');
$client->setStream('test.txt');
$client->send();

$response = $client->getResponse();

echo "body\n\n";
var_dump($response->getBody());

echo "\n\ncontent\n\n";
var_dump($response->getContent());

echo "\n\nfile\n\n";
var_dump(file_get_contents('test.txt'));

And the output (truncated):

body

string(35149) "                    GNU GENERAL PUBLIC LICENSE

content

string(12130) "^_<8B>^H^@^@^@^@^@^@^C<C5>}[s<92><EE>{<FD><8A>

file

string(12130) "^_<8B>^H^@^@^@^@^@^@^C<C5>}[s<92><EE>{<FD><8A>

getBody is indeed decoded but the streamed data stored in the file is not.