kumar303 / mohawk

Python library for Hawk HTTP authorization
BSD 3-Clause "New" or "Revised" License
47 stars 20 forks source link

ValueError with accept_untrusted_content=True and hash provided in header. #43

Closed torotil closed 7 years ago

torotil commented 7 years ago

With a Receiver configured to not validate the content-hash

Receiver(lookup_hawk_credentials,
         auth_header,
         url,
         env.get('REQUEST_METHOD'),
         content=None
         content_type=None,
         accept_untrusted_content=True)

you get a ValueError: payload content and/or content_type cannot be empty without an explicit allowance whenever a client does send a content-hash.

The reason is in HawkAuthority._authorize():

       if 'hash' not in parsed_header and accept_untrusted_content:
            # The request did not hash its content.
            log.debug('NOT calculating/verifiying payload hash '
                      '(no hash in header)')
            check_hash = False
            content_hash = None
        else:
            check_hash = True
            content_hash = resource.gen_content_hash()

Here the check for the hash supersedes accept_untrusted_content.

As stated in #24 I still think these flags add unneeded complexity.

torotil commented 7 years ago

… essentially this means there is currently no way to accommodate both GET requests without content-hash and GET-requests with a content hash for it's empty payload and Content-Type.

kumar303 commented 7 years ago

Seems like a bug. Thanks.

The discussion you added to in https://github.com/kumar303/mohawk/issues/24 seems like a separate issue. I think it would be easier to follow what changes you're proposing if you file it as a new issue.

rneilson commented 7 years ago

It seems to me (having read through this and #24 as well) that there's an ambiguity around accept_untrusted_content. Does it mean skip all hash checks? Does it skipping a hash check only if it has no content or content type (ie pretty much all GET requests)? Or does it mean skipping a hash check if no hash is provided (which is, if I'm reading everything right, what actually happens)?

@torotil I think if you actually passed in the content and content type of the request, then if a hash is given, it'll be checked.

@kumar303 If it's a question of avoiding verbosity, like you say here, then accept_untrusted_content is ambiguous and thus misnamed (as you mention in #40). There might even be a case for two parameters: accept_unhashed_content and skip_hash_check. The former allows for the hash portion of the header to be absent, and the latter skips the whole process entirely.

But if it's a question of avoiding configuration errors, like you say here, then the current interface has a bit of a problem: the parsed Hawk header isn't available to user code until after the Receiver object has been instantiated, but the hash check is done during instantiation, so there's no way to set the accept_untrusted_content parameter after checking to see if a hash was provided (at least, not without parsing the Hawk header beforehand, which seems...counterproductive). Right now there's no good way to specify "allow a request to be missing a hash value only if the body and content type are empty", at least not without "forc[ing] the caller to make too many security decisions that require having a deep understanding of how the Hawk protocol works", like you said you wanted to avoid.

kumar303 commented 7 years ago

Right now there's no good way to specify "allow a request to be missing a hash value only if the body and content type are empty"

Why not? Can't you just check if content is None and then pass in accept_untrusted_content=True ?

But anyway, this is still dangerous because an attacker could possibly capture / alter a request and make the body empty to bypass the checks -- this may be harmless or it may do damage depending on how you process the request.

It's pretty straight forward to provide a signature hash for empty content so if you have full control of your clients then I don't really see how it's a problem to always require signed content. You probably have to make content='' instead of None which is maybe an annoying part.

rneilson commented 7 years ago

Can't you just check if content is None and then pass in accept_untrusted_content=True ?

Well, no, not at the moment, given #24. I certainly think making the default values of content and content_type some kind of non-None EmptyValue, as you suggest, would help some. But thus my question about verbosity -- do to it right, you'd have to do something like this:

content = request.body or ''
content_type = request.META.get('CONTENT_TYPE', '')
accept_untrusted_content = (request.method == 'GET' and not content and not content_type)
receiver = Receiver(lookup_hawk_credentials,
                    auth_header,
                    url,
                    request.method,
                    content=content,
                    content_type=content_type,
                    accept_untrusted_content=accept_untrusted_content)

Not exactly compact, is it?

But anyway, this is still dangerous because an attacker could possibly capture / alter a request and make the body empty to bypass the checks -- this may be harmless or it may do damage depending on how you process the request.

If we're talking replay attacks, this is what timestamps and nonces are for, not content hashing. Content hashing is for content. If there's no content, and there isn't any expected content (GET requests), the hash check is superfluous. It should be easy to skip in this case (and only this case), and right now it's not. (If there is a hash value given, though, it should be checked -- an attacker just "mak[ing] the body empty" wouldn't affect the content type header per se, nor would it remove an already-calculated hash portion of the already-MAC'd header.)

Like I said: if you're trying to avoid verbosity, this isn't working. If you're trying to avoid misconfiguration, this isn't working, but in a different way.

...if you have full control of your clients...

You don't. That's really kind of the point of Hawk in general, isn't it? More specifically, though, the official browser code doesn't hash by default, and, say, Postman still doesn't hash at all. I totally understand wanting more secure defaults, but right now it takes careful consideration and fairly specific knowledge of the internals to make a very common case work (GET, no content, no hash) without opening bigger holes (skipping checks in general). Which is not exactly the intended outcome of the options as currently implemented.

rneilson commented 7 years ago

To rephrase:

1) The None vs '' content/type problem in #24 is a matter of distinguishing between misconfiguration and deliberate circumvention, and requires always remembering to pass in '' for empty content/type (a subtlety easily forgotten or misunderstood).

2) The accept_untrusted_content option only applies when a hash is omitted, so the shortcut of

Receiver(lookup_hawk_credentials,
         auth_header,
         url,
         env.get('REQUEST_METHOD'),
         accept_untrusted_content=True)

fails when the client does supply a hash, negating the verbosity argument. Basically, one always has to pass all three of content, content_type, and accept_untrusted_content to cover all cases. (Perhaps calling it allow_omitted_hash would be more accurate.)

3) There is in fact a security difference between allowing an unhashed, empty content/type and not checking the hash when provided even with empty content/type. If a hash is supplied, it should be checked without fail (and None vs '' shouldn't make a difference). But where empty content/type is expected, it should be easy to allow, and right now one basically has to read through HawkAuthority._authorize() and Resource.get_content_hash() to understand when and how the parameters interact.

kumar303 commented 7 years ago

handling GET requests

This definitely needs improvement, I agree.

The accept_untrusted_content option ... fails when the client does supply a hash

I didn't know that. This is a bug that should be fixed. Can you file a separate issue with how to reproduce it?

The None vs '' content/type problem in #24 is a matter of distinguishing between misconfiguration and deliberate circumvention

Yeah, I'm concerned about misconfiguration since None is easy to pass by accident in Python. Maybe the EmptyValue strategy will help address that. I didn't copy the way the Node lib does it because IMO it's too lenient with null and undefined which are even easier to pass accidentally in JavaScript.

...replay attacks...

This is a separate concern and I think the nonce strategy solves it.

There is in fact a security difference between allowing an unhashed, empty content/type and not checking the hash when provided even with empty content/type. If a hash is supplied, it should be checked without fail (and None vs '' shouldn't make a difference)...

I was giving an example of a man in the middle attack which is something that TLS mitigates but it's very common for script clients (like out of the box Python, etc) to skip cert checks so I don't think TLS is reliable enough. Example: what if an attacker intercepts a request and removes the hash attribute from the header?

...right now one basically has to read through HawkAuthority._authorize() and Resource.get_content_hash() to understand when and how the parameters interact.

Really? Could you tell me which sections of the docs are lacking in clarity? I'd like to correct them. http://mohawk.readthedocs.io/en/latest/usage.html#receiving-a-request http://mohawk.readthedocs.io/en/latest/usage.html#skipping-content-checks

Postman still doesn't hash at all.

This is terrible and I don't have a lot of sympathy for poor security like this. It means that any man in the middle attacker could alter the contents of the request which could do a lot of damage.

The accept_untrusted_content option was meant as an escape hatch for insecure clients like this so it should let you workaround the issue (once we fix the bug mentioned above).

torotil commented 7 years ago

The accept_untrusted_content option ... fails when the client does supply a hash

I didn't know that. This is a bug that should be fixed. Can you file a separate issue with how to reproduce it?

That's basically the title of this issue ;) For a way to reproduce it see the first comment.

rneilson commented 7 years ago

what if an attacker intercepts a request and removes the hash attribute from the header?

An attacker can't do that without recalculating the MAC, was my point. If the client calculates a hash, that hash is part of the MAC'd string, and can't be removed without changing the resulting MAC value. So that'll raise MacMismatch.

I didn't know that. This is a bug that should be fixed. Can you file a separate issue with how to reproduce it?

That's the exact problem @torotil is describing. accept_untrusted_content is only checked after 'hash' not in parsed_header( see here). Which makes sense if you want to allow the client to omit the hash, but doesn't skip checking the hash if provided (which means you have to pass the content and type anyways, in case the client does include a hash).

Could you tell me which sections of the docs are lacking in clarity?

The skipping content checks section implies always_hash_content and accept_untrusted_content are equivalent between Sender/Receiver, which they aren't. Resource checks always_hash_content here inside gen_content_hash(), but as I said, _authorize() only checks accept_untrusted_content if the hash key is omitted from parsed_header, so the documented example fails when a hash is sent -- and there isn't any way to check that before instantiating Receiver.

I still think this all comes down to accept_untrusted_content only skipping hash check if it's not sent by the client. Which is a useful option, but not what it says on the tin.

kumar303 commented 7 years ago

That's basically the title of this issue ;)

Ha, so it is. It's been a while since it was filed so I forgot. I don't have a lot of time for patching mohawk these days so any help on PRs would be gladly appreciated.

rneilson commented 7 years ago

I think there needs to be another option, call it skip_hash_check or something, which allows bypassing the whole thing if truly desired (which might be acceptable if the request body is large or streamed or something). Maybe deprecate accept_untrusted_content and rename it to allow_missing_hash, as well (and update the docs to indicate you still have to pass in the content and type).

If you're okay with that, I can start working on a PR.

rneilson commented 7 years ago

Actually, instead of allow_missing_hash, might be better to add always_hash_content as a parameter to Receiver as well as Sender, which would make a better symmetry. (I'd still put in skip_hash_check as a new option, because at present there isn't any way to completely (if only temporarily) ignore the hash check even if present.)

kumar303 commented 7 years ago

An attacker can't do that without recalculating the MAC, was my point. If the client calculates a hash, that hash is part of the MAC'd string, and can't be removed without changing the resulting MAC value. So that'll raise MacMismatch.

Well, it's been over a year since I implemented this so my memory is fuzzy but I seem to recall seeing some possible attacks related to this. Maybe it would be easier to show me a patch with how you'd like the Receiver interface changed showing the test cases for how each attack is thwarted?

This is an important bit about payload checking from the protocol docs:

It is important to note that MAC validation does not mean the hash value provided by the client is valid, only that the value included in the header was not modified. Without calculating the payload hash on the server and comparing it to the value provided by the client, the payload may be modified by an attacker.

So perhaps the attack I was thinking of is that the attacker could change the payload and leave the hash header in tact. But, again, maybe that kind of attack is already thwarted by simply honoring the hash header. If I misunderstood some protections around hash checking and made the interface too strict then let's fix the interface.

the documented example fails when a hash is sent -- and there isn't any way to check that before instantiating Receiver.

Sounds like a patch for this issue (#43) will make the documentation less confusing.

I think there needs to be another option, call it skip_hash_check or... Maybe deprecate accept_untrusted_content and rename it to allow_missing_hash... If you're okay with that, I can start working on a PR.

I would like to see a patch for #43 first because won't that address the need for skip_hash_check ?

rneilson commented 7 years ago

But, again, maybe that kind of attack is already thwarted by simply honoring the hash header.

That's exactly it: if the hash header is omitted, then the body can be modified by an attacker. If present, even if the body is empty, it can't be. But that's not a concern for GET requests, which should have empty bodies anyways.

won't that address the need for skip_hash_check?

There's still one scenario allowed by the reference implementation that isn't allowed by Mohawk: deferring the hash check. As came up implicitly in #15, the two steps of 1) validating the MAC of the request headers, and 2) validating the hash digest of the content are both performed when instantiating Receiver. Might be a separate PR to implement it, but being able to defer the hash check for large/streaming request bodies is still something Mohawk can't do, even once this issue is resolved.

kumar303 commented 7 years ago

...being able to defer the hash check for large/streaming request bodies is still something Mohawk can't do, even once this issue is resolved.

I don't understand. For that case, why don't you just configure the client to omit the hash header?

rneilson commented 7 years ago

That's easiest, yes, but that's a workaround, not an option :P.

I mean, there are legit use cases: say, ensuring an allowed file type based on the start of a file, and failing fast instead of accepting the full body and then checking hash and then responding with 415 Unsupported Media Type.

kumar303 commented 7 years ago

...ensuring an allowed file type based on the start of a file, and failing fast instead of accepting the full body and then checking hash and then responding with 415 Unsupported Media Type.

Oh, I see what you mean now. You still want to validate the content hash, just at a later time. Yeah, we should support that. Can you open a separate issue?

rneilson commented 7 years ago

Will do, let me work on the PR for this one first. Speaking of, do you mind if I deprecate allow_untrusted_content in favor of always_hash_content, to line up Sender and Receiver?

kumar303 commented 7 years ago

do you mind if I deprecate allow_untrusted_content in favor of always_hash_content, to line up Sender and Receiver?

Yeah, that's https://github.com/kumar303/mohawk/issues/40 . The issue is free to work on. Thanks! It would probably be helpful to do it in a separate patch to make code review easier.

I prefer not to use the word "hash" because it's misleading -- the hash gets applied with an HMAC function since a hash alone would be a weak signature.

rneilson commented 7 years ago

No, I think "hash" is correct. A hash digest of the content (and type, and fixed string) is included in the HMAC'd value. Calling it "signing" would be misleading, since nothing's being signed per se, and calling it an HMAC is also a bit misleading, since the HMAC is done over the concatenated attributes, which may or may not include the hash digest of the content (et al).

Plus, the Hawk docs call it "hash", even the header itself calls it "hash". Calling it something else (or worse, conflating the HMAC step with the hash digest step) would be much more misleading.

Anyways, once #45 is in, I'll do something for #40.

kumar303 commented 7 years ago

Oh, whoops, I was getting the hash confused with the mac. Referring to it as a hash to match the spec seems sensible.