php / php-src

The PHP Interpreter
https://www.php.net
Other
38.01k stars 7.73k forks source link

fgets() does not return false on error from a zlib.inflate filtered stream with large content and invalid checksum #13264

Open n-peugnet opened 8 months ago

n-peugnet commented 8 months ago

Description

The following code:

<?php

// Prepare a big enough input so that it is not entirely buffered
$stream = fopen('php://memory', 'r+');
$content = '';
for ($i = 0; $i < 10000; $i++) {
    $content .= "Hello $i\n";
}
fwrite($stream, gzcompress($content));

// Mess up the checksum
fseek($stream, -1, SEEK_CUR);
fwrite($stream, '1');

// Rewind and add the zlib filter
rewind($stream);
stream_filter_append($stream, 'zlib.inflate', STREAM_FILTER_READ, ['window' => 15]);

// Read the filtered stream line by line.
while (($line = fgets($stream)) !== false) {
    $error = error_get_last();
    if ($error !== null) {
        // An error is thrown but fgets didn't return false
        var_dump(error_get_last());
        var_dump($line);
    }
}

fclose($stream);

Resulted in this output:

PHP Notice:  fgets(): zlib: data error in /home/nicolas/Source/php/sphinx-inventory-parser/test.php on line 20
PHP Stack trace:
PHP   1. {main}() /home/nicolas/Source/php/sphinx-inventory-parser/test.php:0
PHP   2. fgets($stream = resource(5) of type (stream)) /home/nicolas/Source/php/sphinx-inventory-parser/test.php:20
/home/nicolas/Source/php/sphinx-inventory-parser/test.php:24:
array(4) {
  'type' =>
  int(8)
  'message' =>
  string(25) "fgets(): zlib: data error"
  'file' =>
  string(57) "/home/nicolas/Source/php/sphinx-inventory-parser/test.php"
  'line' =>
  int(20)
}
/home/nicolas/Source/php/sphinx-inventory-parser/test.php:25:
string(7) "Hello 6"

But I expected this output instead:

PHP Notice:  fgets(): zlib: data error in /home/nicolas/Source/php/sphinx-inventory-parser/test.php on line 20
PHP Stack trace:
PHP   1. {main}() /home/nicolas/Source/php/sphinx-inventory-parser/test.php:0
PHP   2. fgets($stream = resource(5) of type (stream)) /home/nicolas/Source/php/sphinx-inventory-parser/test.php:20

As fgets manual says:

If an error occurs, false is returned.

Also note that by replacing 10000 with a smaller number, say 1000 I get the expected output.

PHP Version

PHP 8.2.12

Operating System

Debian testing

nielsdos commented 8 months ago

Can reproduce, and changing to 1000 gets the expected behaviour. Thanks for the report. Bonus points for the memory leak when using 10000...

n-peugnet commented 8 months ago

Another strange thing that may be related is that adding var_dump(feof($stream)); just before the fclose($stream), with either 10000 or 1000, always prints bool(true). I thought that this function could be used to determine if no error happened as we can see in the example.

nielsdos commented 8 months ago

Yes, it's supposed to set the stream to eof on PSFS_ERR_FATAL such that feof($stream) return false, and therein lies the implementation error probably.

nielsdos commented 8 months ago

The problem is that PSFS_ERR_FATAL is returned from php_zlib_inflate_filter to php_stream_fill_read_buffer and that propagates FAILURE to its caller _php_stream_get_line inside the switch statement. However, _php_stream_get_line does not check the error state. So this would fix the error propagation, but does not yet fix the bucket leaks:

diff --git a/main/streams/streams.c b/main/streams/streams.c
index d45a9bfab8..aa4e949d7f 100644
--- a/main/streams/streams.c
+++ b/main/streams/streams.c
@@ -981,7 +981,12 @@ PHPAPI char *_php_stream_get_line(php_stream *stream, char *buf, size_t maxlen,
                }
            }

-           php_stream_fill_read_buffer(stream, toread);
+           if (UNEXPECTED(php_stream_fill_read_buffer(stream, toread) != SUCCESS)) {
+               if (grow_mode) {
+                   efree(bufstart);
+               }
+               return NULL;
+           }

            if (stream->writepos - stream->readpos == 0) {
                break;

EDIT: running a test with the leak fix too on my fork https://github.com/nielsdos/php-src/pull/86

nielsdos commented 8 months ago

Test PR (contains leak and behaviour fix) seems to have passed.

I'm starting to doubt the expected behaviour here though. The memory leak is no doubt a real issue. The behaviour issue might be intended but unexpected behaviour... What happens is that the read buffer fills up, then errors out and marks the stream as borked. But there is still data in the read buffer at that point, so we can still get lines until the read buffer is depleted and fgets returns false. So it might be intended. cc @bukka any thoughts on this?

n-peugnet commented 8 months ago

What happens is that the read buffer fills up, then errors out and marks the stream as borked. But there is still data in the read buffer at that point, so we can still get lines until the read buffer is depleted and fgets returns false. So it might be intended.

The thing is, the data is considered to be corrupted since the checksum failed (it is not in my example since I altered the checksum and not the data, but it will probably be the other way around in real life). Returning garbage data in fgets is not useful.

Also, even if fgets returns false at the end of the buffer, feof does return true. So if the fgets error is silenced, it can not be detected with feof later (this is the issue that made me discover this behaviour in the first place).

bukka commented 7 months ago

I don't think php_stream_get_line should fail if php_stream_fill_read_buffer returns just FAILURE. I think it could break some scenarios. I haven't verified this but wouldn't that potentially break a situation where other side of connection closes the connection after writing all data and this side of connection tries to fill more data to the buffer? In such case I would imagine the php_stream_get_line (fgets) still returns data that it has got buffered. I think there might be some other cases where it makes more sense to return what's in the buffer even if there is some fail.

It might make more sense to introduce some flag to mark the error terminating and in such situation fail. In this case PSFS_ERR_FATAL could be consider for that. It just sets eof atm moment and checking only eof might still limit some use case so maybe some new flag would make more sense. So essentially something like this

diff --git a/main/php_streams.h b/main/php_streams.h
index 31b80de986..8996ee8bcb 100644
--- a/main/php_streams.h
+++ b/main/php_streams.h
@@ -223,6 +223,9 @@ struct _php_stream  {
        /* whether stdio cast flushing is in progress */
        uint16_t fclose_stdiocast_flush_in_progress:1;

+       /* whether fatal error happened and all operations should terminates as soon as possible */
+       uint16_t fatal_error:1;
+
        char mode[16];                  /* "rwb" etc. ala stdio */

        uint32_t flags; /* PHP_STREAM_FLAG_XXX */
diff --git a/main/streams/streams.c b/main/streams/streams.c
index 9f79821f0c..e2a55e62a8 100644
--- a/main/streams/streams.c
+++ b/main/streams/streams.c
@@ -636,6 +636,7 @@ PHPAPI zend_result _php_stream_fill_read_buffer(php_stream *stream, size_t size)
                                        /* some fatal error. Theoretically, the stream is borked, so all
                                         * further reads should fail. */
                                        stream->eof = 1;
+                                       stream->fatal_error = 1;
                                        efree(chunk_buf);
                                        retval = FAILURE;
                                        goto out_is_eof;
@@ -998,7 +999,12 @@ PHPAPI char *_php_stream_get_line(php_stream *stream, char *buf, size_t maxlen,
                                }
                        }

-                       php_stream_fill_read_buffer(stream, toread);
+                       if (php_stream_fill_read_buffer(stream, toread) == FAILURE && stream->fatal_error) {
+                               if (grow_mode) {
+                                       efree(bufstart);
+                               }
+                               return NULL;
+                       }

                        if (stream->writepos - stream->readpos == 0) {
                                break;

This would be just for PHP-8.3 and master. PHP-8.2 is using uint8_t but there should be some space due to alignment and still possibly one flag free but not sure if it's not too risky for 8.2 anyway.

Ideally we could let user decide which error is terminating but then it comes to an error propagation in general. That's a bigger problem and I plan to be looking into it soon. The idea is to potentially introduce some sort of stream error handler that would be called on each error and the decision whether to continue could be done there.

nielsdos commented 7 months ago

I don't think php_stream_get_line should fail if php_stream_fill_read_buffer returns just FAILURE. I think it could break some scenarios. I haven't verified this but wouldn't that potentially break a situation where other side of connection closes the connection after writing all data and this side of connection tries to fill more data to the buffer?

Yes I agree. At first I didn't think of this but when I started working more and more on the issue I started realizing that it might be a problem.

I think your proposed patch makes sense. We still need a fix for the memory leak of the buckets too. I did that in https://github.com/nielsdos/php-src/pull/86/files#diff-34c102f51eb0af2504e7d1c1f588a83c6555ae92d2f83602314ea45fc94621b4R636-R640 but I'm not entirely sure it is right because I see some bucket swapping code around /* wind the handle... */.

bukka commented 7 months ago

I have just done some investigation of the filtering and it looks to me that most filters only clears the currently processed input bucket when failing with PSFS_ERR_FATAL. The problem is that when it fails, it might leave data in both brigades. The exception of this is a user filter that clears it: https://github.com/php/php-src/blob/PHP-8.2.16/ext/standard/user_filters.c#L198-L215 . It actually took me a bit of time to figure this out as I tried to recreate the issue using user filter before finding out that it's different than zlib filter. For filters that don't do this which like zlib.inflate, it might leak for both brigades.

Output brigade can leak data when filter creates more buckets like zlib.inflate filter can do if bucket->buflen > data->inbuf_len. In such case, it produces more than one buckets so if the error happens during the second loop when there is already a bucket in output brigade, then it will leak data. Your fix fixes this part which seems correct.

Input brigade lead is a bit more complex because it requires more than one filters to be involved when the first filter has to produced extra buckets. The switch is there to pass output brigade of the first filter (that would contain more than one bucket in such scenario) and pass it as an input brigade to the second filter. If the second filter fails when processing first bucket, it will just free that bucket but leaves the subsequent buckets in the input brigade. It means that we should also free the input brigade.

So we basically have two options. Handle it in each filter like it's done in user filter or do it in fill buffer which is what you have done. As this is kind of common issue for all filters I think it makes sense to do it for all filter and clear both brigades there. We could then remove the out brigade clearing from user filter and keep there produced warning when something is left in input brigade. Although I'm not sure that warning makes always sense because why would user filter need to read all input before returning PSFS_ERR_FATAL. It should be there just for PSFS_PASS_ON IMHO

nielsdos commented 7 months ago

Nice work finding that out. Option 2, i.e. move the handling to the common place and keep the warning makes most sense. The warning is indeed a bit odd indeed and it could be amended on the master branch (or sometime in the future) if necessary.

bukka commented 7 months ago

Yeah agreed. Do you want me to handle it or you want to do it? I'm cool to sort it out if you prefer to concentrate on other things.

nielsdos commented 7 months ago

Yeah agreed. Do you want me to handle it or you want to do it? I'm cool to sort it out if you prefer to concentrate on other things.

If you could do it, that would be great, as I'd need time to get a bit more familiar with the filters. I kinda forgot about this and am working currently more focussing on DOM stuff.

bukka commented 7 months ago

sure, no probs, will handle it hopefully next week.

bukka commented 6 months ago

just a quick update that I have been really busy last and this week but should hopefully find some time on this later next week.

bukka commented 6 months ago

Part one of the fix in https://github.com/php/php-src/pull/13790 - memory leak fix

bukka commented 6 months ago

Re-opening as only part of this is fixed now.