kumar303 / mohawk

Python library for Hawk HTTP authorization
BSD 3-Clause "New" or "Revised" License
47 stars 20 forks source link

Provided payload hash should be used when validating MAC #15

Closed gregarndt closed 8 years ago

gregarndt commented 9 years ago

It appears that mohawk calculates [1] the content hash even when a hash was provided in the request, and uses that calculated hash for validating the MAC [2] .that the content hash is created by mohawk and used for validating the MAC when according to a statement [3,4] in the node hawk implementation the hash provided in the request should be used.

One of the issues that comes up when not calculating using the provided hash is that if there are encoding differences with the payload, this will result in a different payload hash. However, the error that's returned is a mac mismatch when really the useful message about where to look is with a content mismatch, such as the one raised later on in base.py with the MisComputedContentHash exception.

One way to duplicate this is to use a different client for sending a hawk request that doesn't automatically encode the payload for you (such as the node hawk client). mohawk utf8 encodes the payload and calculates the hash while the node client allows you to use whatever encoding you wish.

[1] https://github.com/kumar303/mohawk/blob/master/mohawk/base.py#L42 [2] https://github.com/kumar303/mohawk/blob/master/mohawk/base.py#L48 [3] "If the payload is available at the time of authentication, the server uses the hash value provided by the client to construct the normalized string and validates the MAC" [4] https://github.com/hueniverse/hawk/blob/master/README.md#payload-validation

edmorley commented 9 years ago

I suppose in addition to giving a more informative exception type in real "incorrect content hash" cases (ie once fixed mohawk will raise MisComputedContentHash rather than MacMismatch) it actually helps performance in the more straightforwards incorrect MAC cases too - since in those cases mohawk will no longer need to perform the more expensive hashing of the body at all.

kumar303 commented 9 years ago

If there are encoding differences between the Hawk client and a mohawk server then that sounds like a bug in mohawk. Can you provide example code to reproduce it?

according to a statement [3,4] in the node hawk implementation the hash provided in the request should be used.

Yes, but only as a theoretical optimization. What the spec says is that you could do a first-pass quicker MAC calculation using the client provided hash value simply to see if it fails. If it doesn't fail, you'd still need to calculate your own hash and re-calculate the MAC (which means you'd always be calculating a MAC twice) because otherwise an attacker could just swap out a known good hash in the request header and trick your server.

This is the key point in the section you linked to:

Without calculating the payload hash on the server and comparing it to the value provided by the client, the payload may be modified by an attacker.

I would consider adding a first-pass optimization to mohawk if there was a patch but I'm not sure if it's worth the effort. I'd like to see some metrics showing that it adds value. Also, I don't think it would address the problem you were seeing if there was an encoding problem so let's sort that out first :)

edmorley commented 9 years ago

To reword the OP slightly:

STR:

  1. Use the JS hawk implementation to send a payload to a mohawk powered REST API, but forget to UTF8 encode the body (that isn't plain ascii) before passing to Hawk (unlike mohawk, the JS hawk implementation doesn't automatically UTF8 encode the body before calculating the hash).
  2. Observe the mohawk auth exception generated on the server.
  3. Try to realise what you've done wrong, using the exception type shown in the server logs.

Expected: The exception is MisComputedContentHash, which allows the user to realise more quickly what the problem was.

Actual: MacMismatch exception, since mohawk is using the calculated payload hash as part of the mac calculation, rather than the "provided" payload hash. As a result, the user is misled into thinking the auth problem lies with something other than the body of the message (and they don't think that the encoding hawk-side may be the problem.

Does that make sense? :-) (I've also filed https://github.com/hueniverse/hawk/issues/152 to make the need to manually UTF8 encode the hawk body be clearer in their docs; but it's still good to cover all of our bases IMO)

kumar303 commented 9 years ago

Did the payload have non-ascii characters in it? Could you give me an example payload that I could use to reproduce this?

gregarndt commented 9 years ago

The payload did indeed have non-ascii characters in it. If I recall correctly, it was a contributor's name that contained the characters. I don't have a payload anymore and our logs only go back so far, but I can try to dig something up.

edmorley commented 9 years ago

There are two potentials bugs here:

  1. The mohawk implementation doesn't appear to follow the upstream spec
  2. If mohawk is intentionally choosing not to follow the spec, then the naming of exceptions needs to change, since the current usage of MacMismatch and MisComputedContentHash give false explanations in the "the content hash was incorrect" case.

Both of these can be seen from code inspection; is a specific testcase still useful? :-)

kumar303 commented 9 years ago

Both of these can be seen from code inspection; is a specific testcase still useful? :-)

No test case necessary. I'll create a non-ascii message and see what happens in hueniverse vs. mohawk.

I'm not totally clear on how the hueniverse package handles this. The thread you linked to in combination with the spec docs has contradictory explanations of how payloads should be encoded before hashing.

Thanks for the bug report :)

edmorley commented 9 years ago

Ah I've just noticed - there's an existing test that shows we get MacMismatch rather than MisComputedContentHash: https://github.com/kumar303/mohawk/blob/4309017b3d8b980601e56fe4635724d8b03f3489/mohawk/tests.py#L189-L192

kumar303 commented 9 years ago

btw, assuming the bug gets fixed, I don't think changing the exception to MisComputedContentHash for a client that hashed its content wrong is going to help much. You really just have to turn on debug logging at the server level to see how it assembled all content parts when creating the hash then compare that to equivalent debug logging on the client side.

kumar303 commented 9 years ago

There is a note in the docs about how to debug MacMismatch exceptions.

edmorley commented 8 years ago

I don't think changing the exception to MisComputedContentHash for a client that hashed its content wrong is going to help much. You really just have to turn on debug logging at the server level

Yeah I agree in the worst case, but if enabling verbose logging on production can be avoided by giving a clue as to where to focus (or at the very least not giving an incorrect clue) - then it avoids the need to do production pushes, sift through potentially gzipped logs over multiple webheads [1] and try to coordinate with a third-party's logs (after having to get them to enable debug logging too).

Anyway I hope to have a PR for this soon :-)

[1] our current stage/prod doesn't use papertrail, once we move to Heroku we will be

kumar303 commented 8 years ago

As for the PR (fixing the encoding issue), I suppose we need to follow this as a guide: https://github.com/hueniverse/hawk/issues/152#issuecomment-155131505

torotil commented 8 years ago

There is another issue hidden in there: What if there is no payload (ie. in a GET-request)? Currently mohawk ends up interpreting this as an empty string, then calculating the hash and using that for the MAC-calculation.


If it doesn't fail, you'd still need to calculate your own hash and re-calculate the MAC (which means you'd always be calculating a MAC twice) because otherwise an attacker could just swap out a known good hash in the request header and trick your server.

IMHO the upstream implementation doesn't calculate the MAC twice. The implementation does:

  1. Calculate the MAC using the header supplied hash -> MacMismatch if they don't match
  2. Calculate the hash using the actual payload and compare it to the header-supplied hash -> MisComputedContentHash if they don't

There is no need to do another calculation of the MAC.

See server.js line 165-183.

torotil commented 8 years ago

I've created a pull request for this issue: #28

kumar303 commented 8 years ago

Fixed in https://github.com/kumar303/mohawk/pull/28 - thanks @torotil !