kjdev / php-ext-brotli

Brotli Extension for PHP
MIT License
172 stars 29 forks source link

Support incremental compression and decompression #26

Closed davidmhammond closed 8 months ago

davidmhammond commented 4 years ago

Although the brotli extension supports streaming via stream wrappers, it is currently missing the ability to compress/decompress incrementally. This feature is needed in order to achieve certain streaming workflows that cannot be accomplished with the existing functionality.

This would involve adding at least four new functions: brotli_compress_init(), brotli_compress_add(), brotli_uncompress_init(), and brotli_uncompress_add(). These functions should be analogous to the zlib extension's deflate_init(), deflate_add(), inflate_init(), and inflate_add(), respectively, including flags to control flush behavior (at a minimum, equivalents of ZLIB_SYNC_FLUSH and ZLIB_FINISH).

Example of when incremental compression is needed: You're using a response model such as PSR-7, the page output is generated in a streaming manner via a generator, and the response object contains headers indicating brotli compression. In this case, as you pull data from the response stream, that data should come out already brotli-compressed (this stream may or may not represent the final output to the browser). Currently, this can only be accomplished by compressing the entire stream contents at once (which means you don't get a true streaming pipeline), or by an elaborate hack whereby reading the stream writes to a brotli stream wrapper that is wrapped around a custom stream wrapper that writes to a buffer which the first stream then reads from. Support for incremental compression would make this much simpler and less hacky.

Example of when incremental decompression is needed: You're using a stream filter framework where each filter operates on a chunk of data at a time, and you want to create a filter that performs a brotli decompression of the input. With incremental decompression, the filter object could call brotli_uncompress_init() for the first chunk, and return brotli_uncompress_add() for each chunk in the stream, passing the BROTLI_FINISH flag on the final chunk. Without incremental decompression, there's really no way to accomplish brotli decompression in this kind of filter system without just building a giant string and decompressing it all on the final chunk (which, again, means you don't get a true streaming pipeline).

kjdev commented 4 years ago

Added a new function for simple incremental processing.

sample.

// compression
$resource = brotli_compress_init();
$compressed = '';
$compressed .= brotli_compress_add($resource, 'Hello, ', BROTLI_PROCESS);
$compressed .= brotli_compress_add($resource, 'World!', BROTLI_PROCESS);
$compressed .= brotli_compress_add($resource, '', BROTLI_FINISH);

echo brotli_uncompress($compressed), PHP_EOL; // Hello, World!

// uncompression
$resource = brotli_uncompress_init();
$uncompressed = '';
$uncompressed .= brotli_uncompress_add($resource, substr($compressed, 0, 5), BROTLI_PROCESS);
$uncompressed .= brotli_uncompress_add($resource, substr($compressed, 5), BROTLI_PROCESS);
$uncompressed .= brotli_uncompress_add($resource, '', BROTLI_FINISH);

echo $uncompressed, PHP_EOL; // Hello, World!
JakubOnderka commented 3 years ago

Hello, I like the new functions for incremental compression! But I miss settings that are possible to set for brotli_compress method, but not possible for brotli_compress_init, like quality and mode. Will you consider to add these settings also for incremental compression? Thanks.

kjdev commented 3 years ago

c780933 Added an argument to brotli_compress_init().

kkmuffme commented 8 months ago

This doesn't seem to work correctly: _add always returns an empty string until I call it with BROTLI_FINISH then it returns the whole thing at once. Using deflate_add works (I get a part every time)

That's a problem because you cannot stream data with brotli like this (and using brotli_compress to compress each chunk separately is technically valid, but isn't supported by browsers, as they have had a bug with gzip already, despite this being valid according to RFC 1952)

EDIT: this will make serving pages with brotli much slower compared to e.g. gzip, since you have to wait for the whole page to be buffered basically, before anything gets sent.

kkmuffme commented 8 months ago

Looking at deflate_add which uses ZLIB_SYNC_FLUSH by default and checking https://brotli.org/encode.html, it seems the issue why this doesn't work is that we need an BROTLI_OPERATION_FLUSH after BROTLI_OPERATION_PROCESS to get data intermittently.

Using a similar approach as finish here https://github.com/kjdev/php-ext-brotli/blob/master/brotli.c#L1224 you could just add this there:

    if (mode == BROTLI_OPERATION_FLUSH) {
        while (!BrotliEncoderHasMoreOutput(ctx->encoder)) {
            size_t available_out = buffer_size;
            uint8_t *next_out = buffer;
            if (BrotliEncoderCompressStream(ctx->encoder,
                                            BROTLI_OPERATION_FLUSH,
                                            &available_in,
                                            &next_in,
                                            &available_out,
                                            &next_out,
                                            0)) {
                buffer_used = (size_t)(next_out - buffer);
                if (buffer_used) {
                    smart_string_appendl(&out, buffer, buffer_used);
                }
            } else {
                efree(buffer);
                smart_string_free(&out);
                php_error_docref(NULL TSRMLS_CC, E_WARNING,
                                 "Brotli flush failed\n");
                RETURN_FALSE;
            }
        }
    }

Then just declaring BROTLI_FLUSH and updating the docs - maybe I suggest to also change the default BROTLI_PROCESS of brotli_uncompress_add to BROTLI_FLUSH to match the behavior of deflate_add in PHP.

I would PR this, but I have never used c before :)

@kjdev

kjdev commented 8 months ago

What would be the case that it does not work properly?

<?php
// https://www.php.net/manual/en/function.deflate-add.php : Example

$handler = fopen('/tmp/test.csv', 'w');
// $deflateContext = deflate_init(ZLIB_ENCODING_GZIP, ['level' => 9]);
$brotliContext = brotli_compress_init();

$strings = [
  'Hello, how are you?' . PHP_EOL,
  'I am fine thanks' . PHP_EOL,
  'Hello, how are you?' . PHP_EOL,
];

foreach ($strings as $string) {
  // fwrite($handler, deflate_add($deflateContext, $string, ZLIB_NO_FLUSH));
  fwrite($handler, brotli_compress_add($brotliContext, $string, BROTLI_PROCESS));
}

// fwrite($handler, deflate_add($deflateContext, '', ZLIB_FINISH));
fwrite($handler, brotli_compress_add($brotliContext, '', BROTLI_FINISH));
fclose($handler);

// echo gzdecode(file_get_contents('/tmp/test.csv'));
echo brotli_uncompress(file_get_contents('/tmp/test.csv'));
kkmuffme commented 8 months ago

Streaming the output (I guess brotli.output_compression has the same issue) via HTTP 1.1. chunked or HTTP2/3.

simplified example:

<?php

class Foo {
    private $resource;
    public function start() {
        ob_implicit_flush();
        ob_start(array($this, 'buffer'));
    }

    function buffer($output, $phase) {
        if ($phase & PHP_OUTPUT_HANDLER_START) {
            header( 'Content-Encoding: br' );
            $this->resource = brotli_compress_init(6, BROTLI_TEXT);
        }

        if ($phase & PHP_OUTPUT_HANDLER_FINAL) {
            return brotli_compress_add($this->resource, $output, BROTLI_FINISH);
        }

        return brotli_compress_add($this->resource, $output, BROTLI_PROCESS);
    }
}

$foo = new Foo();
$foo->start();

echo "hello world " . str_repeat('a', 100) . '<br>';
sleep(5);
echo "2hello world " . str_repeat('a', 100) . '<br>';
sleep(5);
echo "2hello world " . str_repeat('a', 100) . '<br>';

You won't see any output until 10 seconds (= full script has finished), since brotli_compress_add always returns an empty string.

This is bc BROTLI_PROCESS is same as ZLIB_NO_FLUSH (as you correctly point out). What we need here is however BROTLI_FLUSH (in brotli terms it's BROTLI_OPERATION_FLUSH) which is the equivalent to ZLIB_SYNC_FLUSH which is the default for deflate_add().

When using this, you will get output immediately, then the next chunk after 5 seconds and the last chunk after 5 seconds, instead of everything at once.

This is important as it doesn't make sense to delay transfering output until the whole script has finished, but we need to send output as early as possible (practically of course we use a 4k/8k chunk size, I just removed it for the example ob_start) to ensure a quick TTFB and faster perceived page load time. (even though the total page load time won't be any faster, but the perceived time is much quicker, since you already see parts of the page)

kjdev commented 8 months ago

https://github.com/kjdev/php-ext-brotli/compare/master...develop

Is a correction like this sufficient?

kkmuffme commented 8 months ago

No bc of https://github.com/kjdev/php-ext-brotli/blob/master/brotli.c#L1200 it will only enter this condition is the input is non-empty, but flush should also work when we just pass '' as argument - this is fine FYI (no change needed to this condition), just explaining the reason why the change isn't sufficient.

Additionally flushing (as per brotli docs) needs a similar process to FINISH, so I think you also need code like I posted here https://github.com/kjdev/php-ext-brotli/issues/26#issuecomment-1883085618 to make it work correctly.

kjdev commented 8 months ago

Is this correct?

https://github.com/kjdev/php-ext-brotli/commit/5e9353e67e515c12a3f6c2795b51391d2e86e5a5

while (available_in || BrotliEncoderHasMoreOutput(ctx->encoder)) {
  if (BrotliEncoderCompressStream(ctx->encoder,
  ..
}

if (mode == BROTLI_OPERATION_FINISH) {
    while (!BrotliEncoderIsFinished(ctx->encoder)) {
        if (BrotliEncoderCompressStream(ctx->encoder,
        ..
    }
}
kkmuffme commented 8 months ago

Tbh I have never developed in c and can't test it (what I posted initially should work though afaik, but again, you are the expert)

Here's a test case based on your previous example that allows you to check if it work's correctly or not:

$brotliContext = brotli_compress_init();

$strings = [
  'Hello, how are you? How is it going?' . PHP_EOL,
  'I am fine thanks' . PHP_EOL,
  'Hello, how are you? How is it going?' . PHP_EOL,
];

$compressed = '';
foreach ($strings as $string) {
  $compressedString = brotli_compress_add($brotliContext, $string, BROTLI_FLUSH);
  if (strlen($compressedString) === 0) {
    echo "Incremental compress failed";
  } elseif ($compressedString === $string) {
    // temporary only for test
    echo "Incremental compress failed to compress";
  } else {
    $compressed .= $compressedString;
  }
}

$compressed .= brotli_compress_add($brotliContext, '', BROTLI_FINISH);
if (brotli_uncompress($compressed) !== implode('', $strings)) {
  echo "Compression invalid";
} else {
  echo "Done";
}
kjdev commented 8 months ago

https://github.com/kjdev/php-ext-brotli/commit/0fefb0ef3f8344c7f3c290158537858d050ba3e2

TEST 4/28 [tests/compress_add.phpt]
PASS Test brotli_compress_add() functionality [tests/compress_add.phpt] 

The test will pass, so we will make adjustments based on this code.

kkmuffme commented 8 months ago

Great, I guess it's good to go then :-)

kkmuffme commented 8 months ago

@kjdev any blockers/questions for this to get added?

kjdev commented 8 months ago

No blockers/questions so far We will respond quickly

kjdev commented 8 months ago

Fixed (master branch)

A modified PECL version will be released later

kkmuffme commented 8 months ago

Thanks!

This issue can be marked as completed :-)