Open aurbjo opened 8 years ago
(Caveat: I'm not a dev with LTB.) Are you talking about a simple blacklist? It sounds like you mean substring matches, meaning !password123!
should fail, since either password
or (at least) password123
should not be a substring.
On a cursory search, there are API-based subscription services (such as https://www.passwordrbl.com). I think this is different than what you are suggesting (and considerably different to implement).
I assume you mean a "static" blacklist ("static" in that its updates are either manual (by the sysadmin) or manual/versioned (by the maintainers of LTB-SSP). There are several known lists (such as http://web.archive.org/web/20130605082350/http://www.isdpodcast.com:80/resources/62k-common-passwords/, among many), so finding a reliable and reasonable source should just take a little time to research. I don't have much experience looking for them, so that link is just an example; I can't speak to it.
Some discussion questions:
1337
-speak transliteration?321wrodssap
)aaPASSWORD123aa
and &13as!&8sd@@PASSWORD123@@f7y*sd
)Obviously the combinations of all this needs to be considered for performance: a great password-matching blacklist does nothing if it takes too long to process and the page times out.
I suggest a first cut on this would be literal string matching, with later extensions to include substrings (with padding limits) and reversals.
What do you think, @coudot, do you find utility in having a blacklist? (I am confident that a lot of my authenticating-users will curse at me for enabling this :-)
@aurbjo, if he likes the idea (and doesn't have a starting point), it might help if you found an acceptable starting point for the list. Think: freely distributed, possibly updated (not antiquated), small enough, and easily parsed (perhaps just a literal text file, one-per-line).
Yes, we could indeed use a simple text file as a blacklist, this file could then be updated by any means outside SSP. We can provide a simple blacklist at installation.
As a thought, would it be a bit robust to use something like cracklib for checking more than just a dictionary? It supports more than just blacklists, though I admit I don't know how to configure its various checks.
In my case i am just thinking about a simple list like @coudot is suggesting, this would deny users setting passwords like "qwerty12345", "password123" and so on. I also see users that have their company name and the current year in their password and it's not a good practice.
Cracklib usage can be another feature.
By the way, the support of cracklib in PHP seems experimental: http://php.net/manual/en/ref.crack.php
(Essentially the same link I posted earlier.) Yes, and furthermore it was removed from base-php (around php-5.0) and pushed to be an extension, so that says that it is not something everyone will need but it has been in the ecosystem for many php versions.
I'll counter with this: I recommend against reinventing the wheel. Even if it has more features than originally suggested, php-crack has fast lookups and attempts to address the performance issues that large lists will bring. As a bonus, it can comment on password-strength (admittedly a contentious and subjective metric). The biggest detriment in my mind is that it isn't a flat text file (it needs to be compiled into the files, as commented on the link you provided), so in that vein it has a little more required investment on the part of the user.
If you think php-crack is too much (for now), I suggest finding another already-existing php package that permits fast lookups in a text file. There's no need to reinvent "lookups" (and therefore less work for coudot to test and maintain).
Indeed, sorry, I did not see the previous link.
If some of you want to propose some code, I'll be happy to review it.
@coudot, do you have a sample LDAP image (e.g., docker) for testing? It's much easier when testing on something other than a production environment.
Hehe, @coudot, I guess I didn't read all of the page I originally provided ... last "update" to php-crack was in 2005. Doesn't really inspire confidence ...
So perhaps the wheel can be reinvented a little here, simplistically.
Some thoughts, let me know if you have reservations:
$out = system("grep -q ...", $retval)
. My primary motivation is efficiency, since I know of no methods that will be able to do yes/no pattern "a match exists" as quickly as grep
and family. Using -q
means that it can short-circuit: as soon as a match is found, it returns with no other searching, perhaps a slight efficiency. (This also means we need to look at $retval
instead of $out
.)zgrep
and bzgrep
; this can be determined by filename and runtime.system()
, I'm thinking it may be safer to write the new password into a temp file and use grep -q -F -f ptnfile dictfile
(the -F
is to ensure it is used as a literal string so that regex chars aren't mistakenly taken as patterns).If the goal is to implement a blacklist of passwords (and not a blacklist of words) in efficient way I would suggest filtering out the lines not conforming to the password policy, save the list sorted in a cache file, then read the cache file and find if the password is present using a binary search.
For example banning all passwords of this list of "one million of the most common passwords" ~ 9 Mio uncompressed
1) load the whole file with file() (< 70 Mio RAM usage in php 7) 2) filter out all passwords not conforming to password policy 3) sort the array (can be already sorted) 4) serialize() (serialized string < 24 Mio RAM in php 7 if nothing is filtered out) 5) dump to file blacklistcache.txt
a) read and unserialize blacklistcache.txt (I'm not sure if file_get_contents + unserialize is better than a file() EDIT: no it's not, see benchmark below); b) perform the binary search with a function adapted/similar to this one
edit: renamed steps edit2: added "no it's not, see benchmark below"
@plewin, you suggest doing all of that for each password change? I haven't benchmarked it, but that seems a bit extraneous. Perhaps the first five steps could be avoided since the program is already checking if the new password meets the password policy (otherwise no dictionary lookup is required).
I understand the suggestion to use a binary search; assuming that the file is sorted (which, given your first five, would be assured), this would most certainly be faster. However, are you sure even this will be faster than using an external grep
?
Back to your first point: the OP did request blacklisting passwords; I wonder if it makes sense to be that literal, since defeating it is as simple as adding one character to the password (from password123
to password123!
might work).
Thanks! I'll see if I can muster up some benchmarking between external grep
and an in-memory binary search in PHP.
@r2evans Steps 1, 2, 3, 4, 5 are done only one time to compile the cache. It will require a script that can be launched manually or automatically if we detect that the cache is older than the dictionnary. Steps a) b) must be done each time there is a password change and only after the new password is valid according to the password policy.
With very basic tests on my side it looks like the binary search idea is not a such good idea after all. It's better only if the dictionnary is already in ram or the call of the file() to load the dictionnary in php is faster than spawning grep with exec.
Is it okay to add a dependency on a external command ? exec isn't portable.
IMHO it's okay if "password123!" passes the blacklist with "password123". An attacker would use the dictionnary, if "password123!" isn't in a dictionnary of the N most common passwords it should be ok to use it.
I understand your description of when things are done, and it's certainly feasible to "invalidate the cache" and regenerate it as needed. I hope it isn't a time-intensive thing, though this can be mitigated several ways. I'm not worried about this part of it.
Your question of external dependencies is appropriate, but I think having grep
available is not too much to ask. Even on "alpine", {,e,f}grep
is provided by busybox
, and that supports all of the options I was considering (-q
, -f
, and -F
). The only problem is on a windows server, in which case there needs to be a note that something like Git-for-Windows or GnuWin32 is needed to get this extension to work. (Note my emphasis: this is an extension to SSP, not a core function, though some may feel it is important enough to be considered "core".)
Your last comment, though correct, is based on the assumption that the attackers are using the same dictionary. This is great justification for something like cracklib, where the strength of the password is tested, including presence of dictionary words. This is a different tact than a password blacklist, and I think the two are very different mindsets (not to mention implementations). So for a "blacklist", you are right, it should be accepted or explicitly added to the blacklist. (Anybody volunteer to maintain php-crack, an extension that hasn't been touched in 11 years?) For this discussion, I'll lean towards "whole-line filtering", a literal blacklist instead of a quasi-pattern blacklist.
I made a little benchmark comparing several possibilities on my ubuntu linux, php 7, i5-2430M CPU @ 2.40GHz + ssd.
file + in_array is the fastest until around 100 000 passwords At 1 000 000 passwords, grep is faster than file + in_array in a factor of ~4 At ~3 700 000 grep is faster in a factor of ~ 10 ?
Traversing the file of 3 700 000 passwords took grep 37 ms (with no options), file + in_array 381 ms, stream_get_line 773 ms. Problem with in_array : its consumes at lot of memory, for the last file php peak memory usage was 276 Mio. stream_get_line is the best alternative to grep and file+in_array but there is a drawback : line endings must be fixed.
Not sure you mean, "line endings must be fixed"?
Nice benchmarking, thanks for that. The response times (for other than grep
) are better than I expected, I'm assuming that difference is due to a fork overhead (?). I agree that, though not likely to trigger an OOM error, 276 Mio is a bit onerous.
Thanks a lot for this work. Seems we can choose to use grep with escapeshell functions in order to have some security when invoking the command. And this feature will be disabled by default.
We will then add documentation to show how download a password list and use it in SSP.
@r2evans Sorry for the confusing words. stream_get_line requires that all line endings are the same and coherent in the file. You must be sure of lines endings (\n or \r\n) and give it as an argument of stream_get_line overwise trimming will be required and there will be a performance penalty like in the fgets vs fgets_rtrim benchmark.
@plewin, thanks, that's what I suspected.
@coudot: absolutely, disabled by default. Any feedback on a testing image/environment?
@r2evans there is no docker image but it should be easy to build one. Anyway you can use any virtualization tool and install SSP with packages. You need a simple LDAP directory with a user account to test the password change.
@r2evans if you need a ldap directory for development you can use the embedded ldap server in Apache Directory Studio like in this tutorial http://www.stefan-seelmann.de/blog/setting-up-an-ldap-server-for-your-development-environment
Both are good ideas. I also found rroemhild/docker-test-openldap which provides half of what I was looking for, and I think I can use one of the docker hub php images (partly because it comes prepopulated with data, that much simpler). Between your suggestions and this, I think I can get something working.
Someone told me about passwqc, and here is an PHP implementation: https://github.com/helver/PHP_passwdqc_check/blob/master/PHP_passwdqc_check/PasswordStrengthTest.inc
They provide a 4k words list and use PHP array methods to check it. Not sure it's the best way but it is interesting to read their code.
You can get a list of about a half billion known compromised passwords from here: https://haveibeenpwned.com/Passwords or you could just hook into his API, personally I don't trust anyone so doing it locally seems more secure even if you are only sending a hash.
I like this repository of passwords, it is from OWASP
https://github.com/danielmiessler/SecLists/tree/master/Passwords
Note for my 2.0 proposal: I have implemented the old password strength checker + a dictionary checker with grep + Zxcvbn. Multiple can be used in the same time and one can add a class to implement a new one. https://github.com/plewin/self-service-password/tree/575c884d8bcd0022ed0f304da2307c509ea03c7a/src/PasswordStrengthChecker
I second @SyBernot's mention of Have I Been Pwned, both as an API and as an optional local-copy. Though I don't think I would download it (at least not initially), I can definitely see utility in supporting both mechanisms. NextCloud just adopted its "range" API function, in which the caller sends only the first 5 chars of the SHA1 hash, and the API sends back all matching hashes and their respective counts. I'm not entirely certain how much security it adds, but the frequency of use and bandwidth overhead don't seem to be too high.
It would be great to be able to specify a list of illegal passwords in the config file, or even illegal words in password.