Closed philipaconrad closed 3 months ago
Name | Link |
---|---|
Latest commit | 0182db32a42f3f700245ebe48169775e838fbc20 |
Latest deploy log | https://app.netlify.com/sites/openpolicyagent/deploys/667c77b7948c3000083dbe38 |
Deploy Preview | https://deploy-preview-6825--openpolicyagent.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
I'm a bit surprised about the approach taken here π I'd have expected us to change the ordering of handlers, since weβre setting up the authzHandler (via initRouters) before the compressHandler. One could argue that authz should happen before decompress (to avoid an attack vector, decompressing not-yet-authorized bodies), but if we end up decompressing in the authz handler anyways, that point is moot.
π€ Am I missing something that makes this idea flawed? π
Ooooh that linked code (CompressHandler) only takes care of compressing the response, doesn't it? That's not what we need, of course.
I guess the alternative approach taken that I had in mind would be to have some sort of middleware for both request payloads and response payloads. π
@srenatus, I do think you've got a valid point w.r.t. the gzipped not-authorized-yet message bodies. I like the idea of having a uniform bit of middleware for decompressing requests! I'll take a look at how hard that would be to implement, and will ping back in this thread with the results.
If we can keep authentication -> compression -> authorization, so authentication will be first, then I think it's no problem at all.
Gotcha! Authentication first should not be an issue-- it's authorization and decompression that have a fixed ordering relative to each other.
I think we actually could do some validation on gzip payload reads, but the method I have in mind relies on low-level knowledge of the format, namely, that the 8-byte trailer at the end of any valid gzip-compressed blob includes the CRC, as well as the decompressed file size in bytes, modulo 2^32
(4GB).
Here's a sketch of what a server/decoding
plugin might look like, that would solve this problem a few levels above where the current PR fix is doing it:
server/encoding
plugin.http.Handler
hooks style to server/encoding
. (Requiring some more initialization logic in server/server.go
, so that the request decompressor wraps everything else)gzip.NewReader(request.Body)
, as NewReader
doesn't do any real work beyond reading the header, until (*gzip.Reader).Read
is called.uint32
for checking against the plugin's configured size limits.Maybe I'm overthinking things? I do think this approach would both boost performance for gzipped requests, and (if my understanding of the compress/gzip
package is correct) would comprehensively resolve issue #6804, even for unauthenticated-but-authorized OPA deployments, at the expense of much heavier refactoring than I'd expected at the outset of trying to solve this issue. :sweat_smile:
Notes:
@srenatus I've added the some gzip + authz policy test cases, so that's no longer a blocker. The tests have some extra cruft in them to allow turning the authz policy parts on/off while I experiment with the server/decoding
plugin idea.
On this branch, if I have no authz policy and run:
echo -n '{"input": {}}' | curlie --data @- http://127.0.0.1:8181/v1/data
I get a 200 OK
and the expected eval result.
But if I run:
echo -n '{"input": {}}' | gzip | curlie -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data
I get:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Vary: Accept-Encoding
Date: Tue, 18 Jun 2024 09:10:31 GMT
Content-Length: 96
{
"code": "invalid_parameter",
"message": "could not decompress the body: unexpected EOF"
}
Given that I don't do authn/authz on the request, it's not surprising if the changes in this PR aren't exercised, but wouldn't we expect the latter request to succeed?
[...] But if I run:
echo -n '{"input": {}}' | gzip | curlie -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data
@johanfylling I tried running that command with plain curl
, and got a 200 OK on both main
, and this PR branch:
echo -n '{"input": {}}' | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data
:arrow_down: Response:
{"result":{}}
I wonder if it's a behavior difference or a bug in curlie
?
@philipaconrad , you're correct: curl
works as expected. This must be a bug in curlie
, which is supposed to just be a simple wrapper of curl to get some prettier output .. apparently they mess something up on the way through π.
In tests with a 1GB gzipped input to curl
, it looks like we're decompressing twice in some cases, due to this change unmarshaling twice, due to how the authorizer unmarshals inputs before the v1CompilePost
endpoint works.
Other endpoints check the request context to see if the input was already parsed for them, but v1CompilePost
does not. Since we probably do not want to have 2x the overhead for POST requests against /v1/data
, I'm going to investigate adding a context lookup for that handler.
geninput.py
:
N = 1024 * 1024
oneKBString = "A" * 1024
if __name__ == "__main__":
print('{"input":{')
for i in range(0, N):
print('"{}": "{}",'.format(i, oneKBString))
print('"{}": "{}"'.format(N, oneKBString))
print("}}")
Example CLI (server):
/usr/bin/time -v ./opa-pr run -s --authorization=basic authz.rego
Example CLI (client):
python3 geninput.py | gzip | curl -H "Content-Encoding: gzip" --data-binary @- http://127.0.0.1:8181/v1/data
:information_source: Here's the current state of things (for @johanfylling, @srenatus, and anyone else following along):
The good news:
The bad news:
server
package, we will end up reading the message body twice for gzipped requests in some cases when authz policies are enabled.
#6804
.io.ReadAll
'd 2x or more by accident. That PR was focused on compressed OPA responses, not compressed requests to OPA, but it had to do something with the requests at the time, since gzip compatibility/detection relies on seeing appropriate gzip-related request headers.Unless there's any other comments/critiques/requests around this PR, I think it should be good-for-merge now.
What changed in this PR?
This PR fixes an issue where an OPA running authorization policies would be unable to handle gzipped request bodies.
Example OPA CLI setup:
Example request:
This would result in unhelpful error messages, like:
The cause was that the request body handling system in the
server/authorizer
package did not take gzipped payloads into account. The fix was to borrow the gzip request body handling function fromserver/server.go
, to transparently decompress the body when needed.Fixes: #6804
TODOs