laminas / laminas-diactoros

PSR HTTP Message implementations
https://docs.laminas.dev/laminas-diactoros/
BSD 3-Clause "New" or "Revised" License
470 stars 62 forks source link

PhpInputStream::getContent() inconsistency #150

Closed Xerkus closed 1 year ago

Xerkus commented 1 year ago

Current implementation of PhpInputStream::getContent() accepts optional length parameter that makes getContent() to act similar to read()

Potentially multiple invocations are reading from php input stream, append to cache property and returning read chunk. This behavior changes, however, once whole of the php input stream is read into cache property: length argument is entirely ignored and getContent() returns full content of the cache property instead.

Per PSR-7 the expected behavior is Returns the remaining contents in a string. Since php input stream is not rewindable once eof is reached getContent() behavior should be to always return empty string instead.

Only __toString() method should utilize cache property at this point to return whole body content.

Xerkus commented 1 year ago

php:://input is seekable in all supported PHP versions which means PhpInputStream is no longer needed.

I believe correct fix here would be to remove special behavior and remove use of $cache property.

weierophinney commented 1 year ago

You're missing an important point, however: the php://input is read once. That's why the caching mechanism is in place. It's not about it being seekable; it's about the fact that if you read any portion of it, that portion is no longer accessible in the stream. This is why caching is present at every point in the implementation where a read can happen.

Secondly: it's reare that you're going to have multiple invocations reading the stream. You will usually read the entire thing at once. If you do not, chances are that the read operations will still be performed in sequence.

I really do not see that there is any necessity to change anything here.

Xerkus commented 1 year ago

In current implementation by this package it is read once. However cache is not needed as php://input is rewindable and could be read any amount of times, including chunked read with normal Stream interface.

Quick experiment:

<?php
# test.php
declare(strict_types=1);

$input = fopen('php://input', 'r');

$firstRead  = stream_get_contents($input);
$secondRead = stream_get_contents($input);
rewind($input);
$afterRewind = stream_get_contents($input);

echo '1: ' . $firstRead . "\n";
echo '2: ' . $secondRead . "\n";
echo '3: ' . $afterRewind . "\n";
$ php -S 0.0.0.0:8080 test.php
[Wed May  3 06:52:29 2023] PHP 8.2.5 Development Server (http://0.0.0.0:8080) started

$ curl --data "some random body" 127.0.0.1:8080
1: some random body
2: 
3: some random body

$ curl --data "some random body" fpm.localhost:32770
1: some random body
2: 
3: some random body
weierophinney commented 1 year ago

If that's the case, that must have changed since we released Diactoros. I don't recall any messaging in internals about it, or an RFC; I'd like to have some validation that this was done purposely before we change this, as it was a nasty source of issues previously, which is why this was implemented.

Xerkus commented 1 year ago

And to showcase inconsistency:

<?php

declare(strict_types=1);

use Laminas\Diactoros\PhpInputStream;
use Laminas\Diactoros\Stream;

require 'vendor/autoload.php';

$stream = new Stream('php://input');

echo 'Stream 1: ' . $stream->getContents() . "\n";
echo 'Stream 2: ' . $stream->getContents() . "\n";

$input = $stream->detach();
rewind($input);

$inputStream = new PhpInputStream($input);

echo 'PhpInputStream 1: ' . $inputStream->getContents() . "\n";
echo 'PhpInputStream 2: ' . $inputStream->getContents() . "\n";
$ curl --data "some random body" 127.0.0.1:8080
Stream 1: some random body
Stream 2: 
PhpInputStream 1: some random body
PhpInputStream 2: some random body
weierophinney commented 1 year ago

Thanks, @Xerkus - I was entirely unclear what the issue was before, but now I can see it clearly.

Normal Stream::getContents() behavior is that unless you call rewind(), it will not return anything on subsequent calls. The PhpInputStream, however, does, which means its behavior deviates.

This gives me what I need to create a test case.

Xerkus commented 1 year ago

Found the changelog entry at https://www.php.net/ChangeLog-5.php Change was made in PHP 5.6:

The php://input stream is now re-usable and can be used concurrently with enable_post_data_reading=0.

weierophinney commented 1 year ago

Oh, fascinating! I've validated with a simple script:

<?php

function readInput()
{
    $contents = '';
    $fh       = fopen('php://input', 'r');
    while (! feof($fh)) {
        $contents .= fread($fh, 1024);
    }
    fclose($fh);
    error_log($contents);
}

readInput();
readInput();

and indeed, I got multiple lines of content in the error logs.

I modified it to allow rewinding the stream as well:

<?php

function readInput($fh)
{
    $contents = '';
    while (! feof($fh)) {
        $contents .= fread($fh, 1024);
    }
    error_log($contents);
}

$fh  = fopen('php://input', 'r');
readInput($fh);
rewind($fh);
readInput($fh);
fclose($fh);

and this worked fine, too.

So I looked into the history, and our first version targetted PHP 5.5. So, technically, we could have removed this in 2.0, when we bumped the minimum supported PHP version to 7.1. :face_with_head_bandage:

I think we can safely remove it in version 3. It's mainly an internal detail, and we can recommend just passing php://input to a Stream instance.

boesing commented 1 year ago

Huge finding here, TIL.