Idea: Disallow Common Passwords

pawl commented 9 years ago

I saw this in django's source code: https://github.com/django/django/blob/09f2cdbe1a43e79e31f5ea509b59d4c87db29832/django/contrib/auth/password_validation.py#L148

    Validate whether the password is a common password.
    The password is rejected if it occurs in a provided list, which may be gzipped.
    The list Django ships with contains 1000 common passwords, created by Mark Burnett:
    https://xato.net/passwords/more-top-worst-passwords/

This seems like a great idea. Is this something we should add to flask-security?

jonafato commented 9 years ago

I like this concept and the one proposed in #414 in theory, but I'm skeptical on these specific implementations becoming part of Flask-Security. While the goals are clearly good for security, it looks like these kinds of tickets would lead down quite the rabbit hole (e.g. Why 1000 passwords¹? Should we force at least one of upper, lower, and numeric instead of just disallowing all numeric? Should we check for dictionary words and disallow those as well?).

An easy first step would be to add a hook that allows a user-defined function that validates a given password against a list of rules and raises an error when one or more do not pass (additional extensions could handle sane defaults for this list). Unfortunately, however, this seems to me like it reduces to enumerating badness, an inherently flawed approach.

The obviously best solution is to simply convince everyone to use strong passwords. Because it's even more unlikely than it is obviously good, a compromise may be attempting to determine the entropy of a password and reject below some configurable value. I'm not sure how to go about doing this at the moment. Naive attempts would include things like checking for numbers and adding 10 to the total number of characters in the keyspace, checking for lowercase letters and adding 26, etc., and then calculating entropy based on the keyspace and length. This approach is pretty trivially made useless by changing 'password' to 'P@ssword', though, which is certainly not a meaningful increase in password strength. (Based on no actual knowledge or examples of this,) I'm reasonably confident that there's been work done on this problem, and we should probably consult existing literature and implementations before deciding on any implementations, lest we create a false sense of security by adding checks that don't actually address the problem enough to matter.

My gut feeling is that the real solution is probably a hybrid approach of enforcing some sort of naively² estimated entropy as well as enumerating badness to attempt to thwart the easy ways that the entropy estimator could be fooled by a user being "clever". Dropbox has a Python port of its zxcvbn library that seems like it would be a really good option.

I like the core concept of this a lot, and I think it's critical that it's done correctly and includes explanations to the end user about why a password is judged to be weak and how to generate strong ones (complete with links to password managers and possibly even XKCD as an informational resource). Additionally, this is (in a sort of roundabout way) related to #161 and #348: this kind of approach would make it harder for brute-forceable passwords to get into a system, and the methods described in those issues would make brute-forcing against a live server³ more expensive once a password is stored.

A caveat on all of the above: I'm not a security professional, just a developer responding to an issue that I think is much more complex than it seems on the surface. I'd like to hear some input from / find literature on authoritative and data-backed sources on this problem, as I'm sure it's one many companies have at least attempted to address. If anyone has thoughts on this, especially @pawl, I'd like to see them shared here⁴.

_{1. he link provided in the Django comment provided does not resolve for me at the moment. Is it still current?} _{2. I call this naive because the real entropy of a password is based on the method by which it is generated, and there's no way for code running on a webserver to know how a user generates a password} _{3. Does this actually happen often enough to worry about, or is most brute-forcing done against the dump of a database offline?} _{4. Sorry for hijacking the issue and for the long rambling. I started typing a while ago, and things just kind of got away from me.}

pawl commented 9 years ago

Great point about "enumerating badness". Here's the best I've found for existing literature so far: https://www.owasp.org/index.php/Authentication_Cheat_Sheet#Implement_Proper_Password_Strength_Controls

The talk mentioned in the warning is especially interesting: https://www.youtube.com/watch?v=zUM7i8fsf0g

He says "obviously that's horrible" when talking about the "blacklisting common passwords" approach: https://youtu.be/zUM7i8fsf0g?t=3m24s

pallets-eco / flask-security-3.0

Idea: Disallow Common Passwords #413