humanmade / aws-xray

HM Platform AWS X-Ray Integration
23 stars 4 forks source link

Add `php://input` and `$_FILES` to trace metadata #95

Open roborourke opened 1 year ago

roborourke commented 1 year ago

This lets us report on REST API body payloads as well as regular form data.

rmccue commented 1 year ago

This isn't free to read the data, both in terms of memory and time usage, so I'm not sure it makes sense for us to add for all requests. (Also, I seem to recall that php://input can only be read from once but maybe that's no longer the case.)

roborourke commented 1 year ago

Yep:

php://input is a read-only stream that allows you to read raw data from the request body. php://input is not available with enctype="multipart/form-data".

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

Should we include $_FILES while we're at it?

rmccue commented 1 year ago

Sorry, typo, that should have said:

that php://input can only be read from once

That is, the stream is exhausted once it's read from. I can't see any info on whether that's still the case though.

There's no way to check the type of form submission reliably that I can find. So this could be limited to just POST, PUT, PATCH, DELETE requests.

HTTP only allows request bodies on POST, PUT, and PATCH; GET, HEAD, and DELETE can't have them.

roborourke commented 1 year ago

Well I was close. Also kinda wild regarding php://input being a one time read. Makes zero sense to me.

roborourke commented 1 year ago

Quick test - created a file input.php:

<?php
$input1 = file_get_contents( 'php://input' );
var_dump( 'once', $input1 );
$input2 = file_get_contents( 'php://input' );
var_dump( 'twice', $input2 );

Run

Output:

HTTP/1.1 200 OK
Connection: close
Content-type: text/html; charset=UTF-8
Date: Thu, 11 May 2023 12:26:06 GMT
Host: localhost:8092
X-Powered-By: PHP/8.0.28

string(4) "once"
string(36) "{"show":"me","what":"you","got":"!"}"
string(5) "twice"
string(36) "{"show":"me","what":"you","got":"!"}"
rmccue commented 1 year ago

Yeah, it was an exhaustible stream so you had to cache the data; it's why the REST API code reads it once and stores it. I think it depends on which SAPI you're using though.

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

roborourke commented 1 year ago

In any case: I think this needs to be behind some kind of flag, because otherwise there's potentially a lot of data being read in, parsed, and pushed out to X-Ray.

See #94

rmccue commented 1 year ago

Truncation isn't a straight solution to that, because you still have the overhead of reading data in and parsing it.

Say, for example, I do curl -X POST example.com/wp-json/wp/v2/media < myfile.jpg, where myfile.jpg is a 1GB file. file_get_contents( 'php://input' ) will return a 1GB string, which will exhaust PHP's memory and cause it to error the whole request.

Even with regular sized JSON data that can be generated from page builders/etc, reading the whole data into memory will use up a decent chunk of memory, plus has the overhead of then truncating if necessary and pushing off to X-Ray. I can see this having an appreciable impact on request times, and causing errors.

(This is one of the reasons that eg Apache access logs don't include HTTP bodies as well.)

Also, the data sanitisation will need adapting for any of this, given that it otherwise will include the data we're sanitising elsewhere.

I see the utility of why you'd want to add this, but without being flagged behind (say) an opt-in header I don't think we can enable it by default.

roborourke commented 1 year ago

Well, we can close this one then. I was going off a throwaway comment on #80 that we should add this while I had the repo open.

The most crucial things for me right now are the truncation and being able to redact data on the initial progress.

joehoyle commented 1 year ago

Ok think we can check the stream length on php://input to do this. I do think it's a good idea to have this

roborourke commented 1 year ago

Ah smart