Why use a salt? - Githubissues

peterthomassen commented 5 years ago

Salt is commonly added to weak passwords so that if reused, they do not produce the same hash. It also exponentially increases the storage burden on those undertaking to create rainbow tables. If entropy is high in the hashed secret itself, these pro-salt arguments no longer apply.

By default, Knox creates 64-digit hex tokens, corresponding to 256 bits of entropy. This is by far sufficient to both exclude accidental reuse, and rainbow table generation. In fact, adding another 16 hex digits salt, corresponding to 64 bit, appears like a rather "random" choice, with questionable benefit.

So, why use salt at all?

johnraz commented 5 years ago

Hello 👋

I was not at the initial design decision but I guess the main reason was still to increase security. You do raise valid arguments though.

I think the rainbow attack mitigation given by salting still stands valid especially as we have potentially long lived token while the auto-refresh feature is a thing.

I also professionally faced many security audit companies who would require you to have every possible security bits in place to meet their requirements, salting would definitely be in.

Why would you like to see it removed?

peterthomassen commented 5 years ago

I would like to challenge that the salt is a security improvement. In fact, it is a downgrade. Here's why:

Because the salt is needed to authenticate a token, it is necessary to perform a salt lookup based on some information contained in the user-provided token itself. This requires you to store information extracted from the plaintext token. Currently, this is done by storing 8 hex digits in plaintext, decreasing the number of secret bits from 256 to 224. (A mitigation could be to hash the prefix separately, haha.)
An API user, though, is under the impression that s:he has a 256 bit token because that's what a 64-digit hex token usually corresponds to.
In the absence of a salt, an attacker trying to build a rainbow table has to deal with 256 bits (and with 64 extra bits if your salt is used). Only strings that have explicitly been put into the rainbow table can be broken, as an exact reverse match is required. An attacker therefore has to put essentially all potential tokens, mapped to their hashes, into the rainbow table. There are 2^256 such tokens. They are not compressible (because they are random), so each takes up 256 bits = 32 bytes. The storage requirement is therefore 2^256 32 bytes = 3.7 10^78 bytes. This is pretty much exactly the number of atoms in the universe, and there is no conceivable way of attempting to create such a rainbow table. (This is just storage; obviously, runtime would be prohibitive, too.) If you add a 64 bit salt, it just means that the rainbow table would be 2^64 = 1.8 * 10^19 times larger, so you could use that many extra universes for your rainbow table project, and it would still fail.

The bottomline is that rainbow tables are not an attack vector for strong tokens, and therefore the salt is not a mitigation of anything, as it is a no-op from a security standpoint. It further puts the user under the false impression of having 256 bits of entropy although it is only 224.

If someone were to set AUTH_TOKEN_CHARACTER_LENGTH to a lower value, such as 32 hex digits (128 bits), then only 96 bits of entropy would remain after stripping the lookup prefix. This would actually significantly degrade the quality of an otherwise more or less acceptable 128 bit token.

However, it comes at great extra complexity:

Two extra model fields are needed, for the salt itself, and for the prefix lookup.
New settings (like prefix and salt length) have to be invented, and good (?) values need to be determined for them.
Extra string slicing has to happen in multiple locations. String manipulation tends to be error-prone (not saying it's buggy, but it is a development/maintenance burden).
Token lookups are complicated on the database level, as you have to retrieve a queryset that can contain any number of matches, filter it etc.

I would like to stress that these points are meant purely from a factual standpoint, and not intended at all to criticize anyone's work! Knox is a great project, and it has a great community. I'd like to contribute my two cents to remove some unnecessary weaknesses and complexities that it currently has.

(In case this discussion leads to a code change, I would also like to pitch moving from SHA-512 to SHA-256. As there are only 256 bits of entropy in the token, approximately every other bit of the SHA-512 output is "non-entropic". The input currently just gets stretched, but the token does not get better. This change would a) improve runtime, b) save storage in the database, c) establish a better cryptographic correspondence between the token and its digest. There is no concern about SHA-256 being easier to break than SHA-512, as both are based on SHA-2, i.e. they are algorithmically the same.)

johnraz commented 5 years ago

Thank you so much for the detailed explanation. This makes a lot more sense now. I agree with everything stated to the limit of my current knowledge and I very much like the idea of simplifying the code base!

I’d like to have others opinion before going forward though especially those who took the design decision.

Thanks again 👍🏻

peterthomassen commented 5 years ago

One complication is certainly that dropping the salt is not straightforward for existing tokens, as the hash cannot be verified once the salt is gone. So, either everybody would have to be logged out, or there would have to get some sort of transition period (something around the max. token lifetime, but that may mean dragging along the technical debt for a long time).

johnraz commented 5 years ago

Sure we need a breaking change and to bump to a major version but nothing we can’t describe in the changelog.

James1345 commented 5 years ago

@peterthomassen Everything you say makes sense. To answer the question about why the salt was originally included, the answer is I honestly don't remember. I'm sure I had a good reason in the first version, but I can't think of one now.

On moving to SHA-256, my original plan was that the algorithm should be configurable. If we never implemented that then that's by oversight rather than design. I seem to remember it being possible at one time to run with MD5 as an option (which was simply included to increase the run speed of tests and not intended to be used as a serious option).

peterthomassen commented 5 years ago

@James1345 Regarding salt -- thanks for your feedback. I'm sure there was some reason at the time, but sometimes things change :-) I don't think any archaeology needs to be done.

Regarding algorithm -- it is possible to configure the algorithm. My suggestion was to change the default, as the current one has no advantage over SHA-256, but a few (smaller) disadvantages.

johnraz commented 5 years ago

Seems like we could go with a simplified, more efficient, salt less token implementation then 🎉

@peterthomassen would you like / have time to submit a PR implementing the salt less version?

I suggest we split the "configurable algorithm" feature in a separate PR.

peterthomassen commented 5 years ago

Unfortunately, I currently don't have time to create a PR. (We have found another solution for our own purposes.)

johnraz commented 4 months ago

Closing as the salt has now been removed

jazzband / django-rest-knox

Why use a salt? #188