php-http / message

HTTP Message related tools
http://php-http.org
MIT License
1.29k stars 41 forks source link

CompressStream and DecompressStream really handles deflate content encoding #64

Closed ajgarlag closed 6 years ago

ajgarlag commented 7 years ago
Q A
Bug? yes
New Feature? no
Version v1.4.0

Actual Behavior

An stream encoded with deflate content-encoding cannot be read by InflateStream, but with DecompressStream.

Expected Behavior

The deflate content-encoding should be managed by InflateStream and DeflateStream while CompressStream and DecompressStream should handle compress content-encoding.

Steps to Reproduce

<?php

require 'vendor/autoload.php';

$payload = file_get_contents('http://httpbin.org/deflate');

$innerStream1 = new spec\Http\Message\Encoding\MemoryStream($payload);
$inflateStream = new Http\Message\Encoding\InflateStream($innerStream1);
$inflatedContents = $inflateStream->getContents();
var_dump($inflatedContents !== ""); //Should be true

$innerStream2 = new spec\Http\Message\Encoding\MemoryStream($payload);
$decompressStream = new Http\Message\Encoding\DecompressStream($innerStream2);
$decompressedContents = $decompressStream->getContents();
var_dump($decompressedContents === ""); //Should be true

var_dump($inflatedContents);
var_dump($decompressedContents);

$realContents = $decompressedContents;

$innerStream3 = new spec\Http\Message\Encoding\MemoryStream($realContents);
$deflateStream = new Http\Message\Encoding\DeflateStream($innerStream3);
$deflatedContents = $deflateStream->getContents();
var_dump($deflatedContents === $payload); //Should be true

$innerStream4 = new spec\Http\Message\Encoding\MemoryStream($realContents);
$compressStream = new Http\Message\Encoding\CompressStream($innerStream4);
$compressedContents = $compressStream->getContents();
var_dump($compressedContents !== $payload); //Should be true

Possible Solutions

To avoid BC breaks, I think we should add a warning in the documentation and in the class docblocks and rename them in 2.0. I'm looking for and HTTP endpoint to test compress content-encoding, but I have not found anyone.

joelwurtz commented 7 years ago

Are we sure this is an error from our side ? it seems that the deflate of httpbin return in fact a compress, test it with :

<?php

$payload = file_get_contents('http://httpbin.org/deflate');

var_dump(gzinflate($payload));
var_dump(gzuncompress($payload));
ajgarlag commented 7 years ago

That was my first thought, so I tested it at http://www.gidnetwork.com/tools/gzip-test.php which reports that http://httpbin.org/deflate compression type is deflate.

Also, reading http://php.net/manual/en/filters.compression.php I saw that all examples related to deflate encoding use a window option with value 15 which is the value used in CompressStream and DecompressStream.

It would be useful to have another endpoint to test the deflate content encoding.

joelwurtz commented 7 years ago

This indicate that PHP may be wrong then as here : https://github.com/php/php-src/blob/master/ext/zlib/php_zlib.h#L31

You can see that PHP_ZLIB_ENCODING_RAW is -15, and this windows size is used by the gzdeflate / gzinflate php function as see here : https://github.com/php/php-src/blob/master/ext/zlib/zlib.c#L728

However php doc say in http://php.net/manual/en/function.gzdeflate.php is RFC 1950 + 1951 so it should use the windows size of 15 and not -15 no ?

joelwurtz commented 7 years ago

So finally, PHP is right, it's the right window as a deflate stream like said in RFC 1951 is basically a RFC 1950 withtout the headers which is created in zlib by specifing a window of -15

However if you look at httpbin here : https://github.com/Runscope/httpbin/blob/master/httpbin/filters.py#L79

They use zlib.compressObj without any parameters which use by default a window size of 15 (like described here : https://docs.python.org/2/library/zlib.html)

This window size set a header like specified in RFC 1950 (but no compliant with 1951).

Both RFC use the same compression algorithm that's why i think http://www.gidnetwork.com/tools/gzip-test.php report this as being a deflate (to test, but i think it will indicate deflate with or without the headers)

ajgarlag commented 7 years ago

The best definition for the deflate content encoding can be located at the RFC 7230 Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing which in section 4.2.2 describes the Deflate Coding with these words:

The "deflate" coding is a "zlib" data format [RFC1950] containing a "deflate" compressed data stream [RFC1951] that uses a combination of the Lempel-Ziv (LZ77) compression algorithm and Huffman coding.

Note: Some non-conformant implementations send the "deflate" compressed data without the zlib wrapper.

Another test tool that identifies http://httpbin.org/deflate as deflate content encoding clarifies that:

deflate - despite its name the zlib compression (RFC 1950) should be used (in combination with the deflate compression (RFC 1951)) as described in the RFC 2616. The implementation in the real world however seems to vary between the zlib compression and the (raw) deflate compression. Due to this confusion, gzip has positioned itself as the more reliable default method.

So I think that the right window size to use is 15, or even better, the PHP constant ZLIB_ENCODING_DEFLATE.

Anyway, as long as major HTTP servers like Apache and Nginx only support GZIP compression, I think we must deprecate CompressStream, DecompressStream, DeflateStream and InflateStream in 2.0

joelwurtz commented 7 years ago

Yes, right, deflate is the a windows size of 15, i think i misinterpret some functions names and http content encoding.

In fact there is no "compress" support in PHP, and we should not support this IMO as there is no interest (since everybody use deflate nowadays)

joelwurtz commented 7 years ago

Finally,

I think our Stream is right, maybe the wording is a little wrong, but Compress is RFC 1950, and Deflate is RFC 1951. The only update here could maybe be a documentation change to explain better which one to use (in fact there is no interest to use directly the Deflate / Inflate Stream).

However the DecoderPlugin is wrong :

ajgarlag commented 7 years ago

IMO we must even deprecate the deflate content-encoding in 2.0 and support only gzip compression.

dbu commented 7 years ago

do you want to deprecate deflate because its not used by common web servers or do you think we can't get it to work properly?

i expect this library to also be used in layered architecture, where the client talks to a server done by the same developers as the client and they could be doing specific things or use a specific server and want deflate i expect. or somebody needs to talk to odd servers they can't control that do deflate...

ajgarlag commented 7 years ago

I think that deflate support should be deprecated because I could not found any endpoint (apart from http://httpbin.org/deflate) that support this content-encoding.

Anyway, it's trivial to fix, so maybe we should drop support for compress only.

dbu commented 7 years ago

as long as we can make reasonably sure that we handle it correctly and its not a huge effort, i'd prefer to keep it in for completeness sake.

joelwurtz commented 6 years ago

Closing this as this issue was more in the client-common and was fixed a long time ago