Closed bobwaycott closed 6 years ago
Hi @bobwaycott , I'm torn on this issue. Your code is beautifully compact. I, however, am not fluent in Regex expressions, and from my experience, not many developers are. I have a very hard time even reading your expressions, let alone having to write one. When I look at the current Python code, I imagine that any Python developer can read, understand, and modify. I lean towards valuing readability over compactness in this case.
As a compromise, I will add your code to the code base as a comment. Thanks!
I must admit I'm pretty surprised Flask-User
would ignore a huge performance win—not to mention implementing something as common as a username & password validator using the right tool for the job (regex). Sure, regex can take a bit of getting used to when you don't deal with it constantly, but I disagree that many developers are wholly unfamiliar with regex. That hasn't been my experience over the last 10+ years.
I'm going to share a quick timeit
readout on Flask-User
's original validation functions, vs the regex-based versions. I've even made some quick & easy modifications to give the original code a speed boost in the _optimized
variants below (there are some simple things Flask-User
should be doing that it is not).
You'll find the original versions here as the _noregex
functions. They've only been modified to accept username
& password
args directly (instead of accessing them via forms), and I also changed them to return True/False
instead of raising exceptions (the goal being to let the functions exit as quickly as they possibly can without incurring more overhead). Additionally, to give each function a relatively level playing field, I've set 8-char & 16-char versions of both a username & password arguments as constants so they wouldn't be created each run when timing.
Here is the code being used:
from re import compile
VALID_CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._'
PASSWORD_REGEX = compile(r'\A(?=\S*?\d)(?=\S*?[A-Z])(?=\S*?[a-z])\S{6,}\Z')
USERNAME_REGEX = compile(r'\A[\w\-\.]{3,}\Z')
USER8 = 'username'
USER16 = 'usernameusername'
PASSWORD8 = 'password'
PASSWORD16 = 'passwordpassword'
def password_regex(password):
return PASSWORD_REGEX.match(password) is not None
def username_regex(username):
return USERNAME_REGEX.match(username) is not None
def password_noregex(password):
""" Password must be 6+ chars and have 1 lowercase letter, 1 uppercase letter and 1 digit. """
# Convert string to list of characters
password = list(password)
password_length = len(password)
# Count lowercase, uppercase and numbers
lowers = uppers = digits = 0
for ch in password:
if ch.islower(): lowers+=1
if ch.isupper(): uppers+=1
if ch.isdigit(): digits+=1
return password_length >= 6 and lowers and uppers and digits
def username_noregex(username):
""" Username must be at least 3 alphanumeric characters long & may contain . - _ chars."""
if len(username) < 3:
return False
valid = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._'
chars = list(username)
for char in chars:
if char not in valid:
return False
return True
def password_noregex_optimized(password):
""" Password must be
- 6 chars or more
- 1 lowercase letter
- 1 uppercase letter
- 1 digit
"""
# Bail now if password is < 6 chars
if len(password) < 6:
return False
# Count lowercase, uppercase and numbers
lowers = uppers = digits = 0
for ch in password:
if ch.islower(): lowers+=1
if ch.isupper(): uppers+=1
if ch.isdigit(): digits+=1
return lowers and uppers and digits
def username_noregex_optimized(username):
""" Username must cont at least 3 alphanumeric characters long"""
if len(username) < 3:
return False
for char in username:
if char not in VALID_CHARS:
return False
return True
Here are the results of timing the functions with as level a playing field as is possible (stated & shown in code above).
Method | Length | Original (µs) | Optimized (µs) | Regex (µs) |
---|---|---|---|---|
username_validator | 8 | 2070 | 1720 | 713 |
username_validator | 16 | 3160 | 2870 | 810 |
password_validator | 8 | 5670 | 4930 | 895 |
password_validator | 16 | 9950 | 9050 | 1160 |
Regex gives you a substantial performance improvement. And I haven't even optimized the regex pattern (one easy win would be to check the string length first to reject too-short usernames/passwords before checking if characters are valid).
More importantly: Regex-based validation time does not increase linearly with the length of the string, whereas per-character checking in a Python loop does.
>>> %timeit for x in xrange(0, 1000): username_noregex(USER8)
100 loops, best of 3: 2070µs per loop
>>> %timeit for x in xrange(0, 1000): username_noregex(USER16)
100 loops, best of 3: 3160µs ms per loop
>>> %timeit for x in xrange(0, 1000): password_noregex('asdf')
100 loops, best of 3: 3520µs per loop
>>> %timeit for x in xrange(0, 1000): password_noregex(PASSWORD8)
100 loops, best of 3: 5670µs ms per loop
>>> %timeit for x in xrange(0, 1000): password_noregex(PASSWORD16)
100 loops, best of 3: 9950µs ms per loop
Just trying to bail on a too-short password takes 3.5ms!
>>> %timeit for x in xrange(0, 1000): username_noregex_optimized(USER8)
1000 loops, best of 3: 1720µs per loop
>>> %timeit for x in xrange(0, 1000): username_noregex_optimized(USER16)
100 loops, best of 3: 2870µs per loop
>>> %timeit for x in xrange(0, 1000): password_noregex_optimized(PASSWORD8)
100 loops, best of 3: 4930µs per loop
>>> %timeit for x in xrange(0, 1000): password_noregex_optimized(PASSWORD16)
100 loops, best of 3: 9050µs per loop
re.compile
d regex patterns>>> %timeit for x in xrange(0, 1000): username_regex(USER8)
1000 loops, best of 3: 713 µs per loop
>>> %timeit for x in xrange(0, 1000): username_regex(USER16)
1000 loops, best of 3: 810 µs per loop
>>> %timeit for x in xrange(0, 1000): password_regex(PASSWORD8)
1000 loops, best of 3: 895µs per loop
>>> %timeit for x in xrange(0, 1000): password_regex(PASSWORD16)
1000 loops, best of 3: 1160µs per loop
Flask-User
is instantiating a string of valid characters every time a username
is validated. This could be a constant so it only gets created once (and could potentially be exposed as an overridable setting to those who use Flask-User
).
The current username/password validators insist on turning supplied usernames & passwords into a list
. However, str
and unicode
objects are inherently iterable in Python, meaning Flask-User
is actually hurting its performance by coercing usernames/passwords to a list
. Here's a quick example that shows the results of iterating over a string vs iterating over a string that is first converted to a list
. Notice we're incurring (in this timed test) a 40µs overhead with the list
call.
>>> %timeit for x in range(100): [char for char in 'username']
10000 loops, best of 3: 139 µs per loop
>>> %timeit for x in range(100): [char for char in list('username')]
10000 loops, best of 3: 179 µs per loop
If Flask-User
used regex-based username/password validation, it would be trivial & a pretty nice win to expose the USERNAME_REGEX
and PASSWORD_REGEX
patterns as config options—allowing anyone who uses Flask-User
to easily implement their own validation rules and do so in a way that is industry standard, and performance-minded.
Validating via regex is the right approach.
So, I was randomly poking around in
Flask-User
today, curious about how passwords were verified & checked. I was really surprised when I found the code. What is currently being done to validate passwords and usernames is very inefficient & unnecessary. Regex is a perfect tool for validating passwords, usernames, emails, and a number of other string- and pattern-based rules.I believe
Flask-User
is in the midst of active0.9
development, and thought I'd just drop in a quick note that this could be drastically improved with regex.For example, password/username validation can be simplified to something like this and still match the same rules:
Just thought I'd bring it up for consideration. This really is the perfect use case for regex, and it's pretty odd to see passwords and usernames being validated by iterating over the entire string in Python code, counting how many times each character appears, and then checking those counts.