Consider changing the token from UUID4 to a secure random string

dxa4481 commented 9 years ago

Even though python's implementation of UUID4 uses a secure random number generator to create it's values, there is no gaurentee this will continue to be true. From the UUID spec

http://tools.ietf.org/html/rfc4122#section-6

" Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation."

dxa4481 commented 9 years ago

It may be safe to close this given the migration to signed cookies.

thequbit commented 9 years ago

We use the same method of a hashlib.uuid.uuid4() for identifying the anonymous clients that are attached to the server.

What would you suggest we use for a 'really big, random enough, collision resistant' identifier?

@dxa4481 thank you very much for your insight here, this has been really enlightening.

dxa4481 commented 9 years ago

If you are worried about an attacker potentially forging other user's identifier 100 bits of entropy should do the trick. You can just use os.urandom() for this https://docs.python.org/3/library/os.html#os.urandom

If you don't have to worry about attackers forging the user's identifier (like for how you generated file names) the UUID4 is fine

thequbit commented 9 years ago

The UUID4 is simply used to prevent collisions right now, and I don't really see that changing. Is it worth concatenating two of them together to protect ourselves for the future?

dxa4481 commented 9 years ago

You mentioned it's being used to track anon users. What happens if a malicious user figures out how numbers are being assigned and forges an identifier for a different anon user? If this is not an issue things are fine the way they are. If this is an issue, use a secure random number generator.

I noticed they are also being used for generating salts. That's fine too, that doesn't need to be a secure random number.

I suggest just grepping through all the instances in the code where UUID4 is being used and ask yourself "What would happen if an attacker magically had a list of all these values?" If that's ever an issue, switch to secure random.

thequbit commented 9 years ago

I'd like to think we could publish the contents to the database each day, and that wouldn't be an issue ( we SHA256 hash the password on the client side before sending it to the server, so the contents of pash_hash is ...

pass_salt = str(uuid.uuid4()
pass_hash = hashlib.sha256(
    "%s%s" % ( hashlib.sha256(<raw_password>).hexdigest(), pass_salt ) 
).hexdigest()

... which, even with knowing the salt, in my understanding, is very secure.

With respect to the UUID's (we call them CUID in the database, standing for client unique identifier), even if an 'attacker' for this id, it would simply just 'pad the stats' of that particular anonymous client. Not so great for 'book keeping', but it doesn't matter since its all anonymous anyway. You don't care what anonymous client it came from, just that it is there.

There is a flow for 'verifying a client'. If this is the case, then I think this does change things. A client can create a username/password (email and password), and connect their CUID to a login. Then they are 'verified when they post' to the server.

This hasn't been implemented yet, so I haven't put too much time into figuring out what the flow is, but this brings up some interesting points. The end points on the server will need to take something that informs the server that the client is who it says it is. Perhaps we're back to the token? Unsure if we can (or would want to) save encrypted cookies on the android/ios device.

I think I'll need to look into this more before we come up with a path forward.

Thoughts?

dxa4481 commented 9 years ago

So first don't publish the database everyday. The salting prevents rainbow tables but it does not prevent brute force and dictionary attacks. Because you are only using 1 round of sha2 the time it would take to brute force or dictionary attack a password is very low. This is why I recommended switching to pbkdf2. This increases the number of rounds of sha2 from 1 to several thousand. While it does not prevent dictionary attacks, it makes them very expensive. Imagine a dictionary attack taking 10000x the time to perform. That is basically the advantage of using pbkdf2.

The signed cookies should be okay for authenticating users, they should be okay to store on android and ios devices. Make sure they have an expiration and it's perfectly fine.

It's pretty easy to switch from UUID to os.urandom so I'd suggest doing that for the case of book keeping on anon clients (because why not right?) but aside from that with the migration to signed cookies, in respect to this issue (the token that was being generated from a UUID4 value) you're in the clear.

thequbit commented 9 years ago

:+1:

dxa4481++

@dxa4481 this advice has been really great, thank you very much!!

thequbit commented 9 years ago

Oh, and for the record I'm wasn't planning to publish the database, I simply was just saying that if we did I wouldn't feel terrible about it (short of maybe the users table). The point of bringing that up was that there really isn't anything in there that is worth keeping secrete. All of the users data will be licensed under the CC BY-SA 4.0, so there is no reason not to allow it to be shown.

Thanks again!

dxa4481 commented 9 years ago

I would disagree. As many as 5% of your users will chose passwords that can be found in a dictionary. There is nothing you can do about that but I would argue you do have an obligation to those users to keep their hashed password under lock and key

thequbit commented 9 years ago

More great insights. Thanks @dxa4481 .

thequbit commented 9 years ago

I believe this can be closed since we are now using signed cookies, and the token is never seen by the client.

We do support passing the token via a GET parameter. We can revisit if this becomes a more widely used method of authentication within the yellr system.

Thanks @dxa4481 for all the great insights here!

hhroc / yellr-server

Consider changing the token from UUID4 to a secure random string #131