Open roborourke opened 1 year ago
This isn't free to read the data, both in terms of memory and time usage, so I'm not sure it makes sense for us to add for all requests. (Also, I seem to recall that php://input
can only be read from once but maybe that's no longer the case.)
Yep:
php://input is a read-only stream that allows you to read raw data from the request body. php://input is not available with enctype="multipart/form-data".
There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.
Should we include $_FILES
while we're at it?
Sorry, typo, that should have said:
that
php://input
can only be read from once
That is, the stream is exhausted once it's read from. I can't see any info on whether that's still the case though.
There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.
HTTP only allows request bodies on POST, PUT, and PATCH; GET, HEAD, and DELETE can't have them.
Well I was close. Also kinda wild regarding php://input
being a one time read. Makes zero sense to me.
Quick test - created a file input.php
:
<?php
$input1 = file_get_contents( 'php://input' );
var_dump( 'once', $input1 );
$input2 = file_get_contents( 'php://input' );
var_dump( 'twice', $input2 );
Run
php -S localhost:8090
echo -n '{"show":"me","what":"you","got":"!"}' | http POST :8090/input.php
Output:
HTTP/1.1 200 OK
Connection: close
Content-type: text/html; charset=UTF-8
Date: Thu, 11 May 2023 12:26:06 GMT
Host: localhost:8092
X-Powered-By: PHP/8.0.28
string(4) "once"
string(36) "{"show":"me","what":"you","got":"!"}"
string(5) "twice"
string(36) "{"show":"me","what":"you","got":"!"}"
Yeah, it was an exhaustible stream so you had to cache the data; it's why the REST API code reads it once and stores it. I think it depends on which SAPI you're using though.
In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.
In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.
See #94
Truncation isn't a straight solution to that, because you still have the overhead of reading data in and parsing it.
Say, for example, I do curl -X POST example.com/wp-json/wp/v2/media < myfile.jpg
, where myfile.jpg
is a 1GB file. file_get_contents( 'php://input' )
will return a 1GB string, which will exhaust PHP's memory and cause it to error the whole request.
Even with regular sized JSON data that can be generated from page builders/etc, reading the whole data into memory will use up a decent chunk of memory, plus has the overhead of then truncating if necessary and pushing off to X-Ray. I can see this having an appreciable impact on request times, and causing errors.
(This is one of the reasons that eg Apache access logs don't include HTTP bodies as well.)
Also, the data sanitisation will need adapting for any of this, given that it otherwise will include the data we're sanitising elsewhere.
I see the utility of why you'd want to add this, but without being flagged behind (say) an opt-in header I don't think we can enable it by default.
Well, we can close this one then. I was going off a throwaway comment on #80 that we should add this while I had the repo open.
The most crucial things for me right now are the truncation and being able to redact data on the initial progress.
Ok think we can check the stream length on php://input
to do this. I do think it's a good idea to have this
Ah smart
This lets us report on REST API body payloads as well as regular form data.