keepassxreboot / keepassxc

KeePassXC is a cross-platform community-driven port of the Windows application “Keepass Password Safe”.
https://keepassxc.org/
Other
21.29k stars 1.48k forks source link

Entropy estimation of user-entered passwords: Not a best practice (a small rant) #2061

Closed gwillen closed 5 years ago

gwillen commented 6 years ago

Expected Behavior

When KeePassXC generates a password or passphrase, it should always provide the EXACT entropy, which it knows from the method of generation. When a user enters or edits a password or passphrase, IF KeePassXC provides an entropy estimate, it should WARN the user that this is only an estimate and is not a reliable means of determining password entropy (and possibly that, in general, generated passwords are a better choice than user-selected ones.)

Current Behavior

When the user generates a passphrase, the correct entropy is given for the diceware passphrase generation process. (But if the user then edits the passphrase, the entropy estimate is unchanged, a clear bug: #867)

When the user generates a password, using the built-in password generator, the entropy is computed using a password cracker. This is not necessary or desirable: KeePassXC knows the actual entropy, which is a function of the generation process it just performed. It should supply that number instead, in the same way it does for passphrases.

When the user manually enters or edits a password, the entropy is computed using a password cracker. This is the best possible approach under the circumstances, but it should be highlighted in red, with a tooltip explaining that entropy estimation is art rather than science, is only an estimate, and will give excessively high estimates for execrably bad passwords in many cases.

Example: If I enter "Glenn Willen, May 27 19xx, Gemini", which is my full name, obfuscated birthdate, and star sign, the cracker estimates that this string has 129.63 bits of entropy. In reality, of course, it has very little; supposing very generously that there are 1000 equally-obvious public facts I might use in this way, and 100 equally-obvious ways to format each, and 3 ways to arrange 3 of them, I still don't even make it up to 20 bits in reality. This is also the reason that the cracker gives unrealistically-high estimates when people manually paste diceware passphrases into the 'password' box -- it doesn't understand the space it's estimating over.

Why is this example important? I think that it's obvious to many people on this bugtracker what the problem with my example is. But I think it is equally not obvious at all to most typical non-expert users. If a piece of security software is telling them, as they add characters to their password, that it's getting more secure, they may believe the software. I claim this is a significant potential danger and a footgun to naive users who don't know why the software does what it does.

Possible Solution

I alluded to one above: When generating passwords, estimate the entropy correctly, using the entropy that was put in by the generator. When the user enters or edits a password, provide an estimate if you wish, but somehow clearly highlight (such as with red text and a tooltip) that it is a bad idea to rely on it.

Steps to Reproduce (for bugs)

Try generating and entering passwords and passphrases in the password generation dialog. Observe the entropy estimator widget.

Context

I am a software engineer in the Bitcoin space. I would not describe myself as a computer security expert (except possibly by osmosis from the people I work with), but I aspire to "security researcher". I am in the process of writing up some best practices for personal computer security for Bitcoin users, which is why I finally got around to investigating KeePassXC as a KeePassX replacement.

Thank you for reading my rant! Obviously I am intending it as a jumping-off point to start discussion, only. It's your app. :-)

Debug Info

KeePassXC - Version 2.3.3 Revision: 0a155d8

Libraries:

Operating system: macOS Sierra (10.12) CPU architecture: x86_64 Kernel: darwin 16.7.0

Enabled extensions:

TheZ3ro commented 6 years ago

This issue falls pretty much into #867. I appreciate your rant and I generally agree with your proposed solution (practically the same of the above issue :stuck_out_tongue: ). I think it would be fine as long as it doesn't scare users as much as the Legacy Keyfile warning #2037 #1549

phoerious commented 6 years ago

I don't see any advantage of what you are proposing. This entropy measure is nothing but a rough measure of how good the password is and in no way accurate. So there is really no difference between 15 and 17 bits of "entropy" here. There is a difference between 15 and 120 bits, but only in the sense that one is "much higher" than the other. We introduced the colour coding to convey this fact. If you ask me, we could actually remove the numbers and only go with the colours.

Your statement about the exact entropy being a function of our generation process is also not correct. When I have [a-zA-Z0-9] as my generation alphabet and I generate the password "abcdeFGH", then its entropy would be 8 -log2(1/64) according to your statement, although it really is just 8 -log2(1/8). Moreover, the fact that it's a sequence of characters and therefore predictable no matter its (hypothetical) entropy is probably a lot more important and that's exactly what zxcvbn takes into account. All in all, entropy on its own is pretty useless for measuring password quality, because it really only takes into account character probabilities and NOT the generation process or any prior knowledge associated with it. "zxcvbn" itself is the best example. Measured by entropy, it scores okay, but since it's the name of a well-known password strength estimator, it will never make a good password, even if it had a 1000 bits of entropy.

chris544 commented 6 years ago

There are two ways of looking at measuring password quality:

First argues: I generate my password using some particular method and I trust that this method produces particular entropy that could be called "generated entropy". I trust the math, the algorithm and its implementation and am not afraid that it could produce a password that somebody considers crackable.

Second argues: I do not really care how I generated my password. I will attempt to crack it using some common methods and predict entropy based on these cracking attempts. This could be called "cracking entropy" and as I understand this is what zxcvbn basically does.

For human generated (or edited) passwords it only makes sense to display "cracking entropy" and there is no disagreement about it here. However I agree with OP that this estimate should be displayed with more caution compared to randomly generated passwords. For a decent example password "onetwothreefourfive" has poor (32 bit) quality estimate (looks too high already), but same "onetwothreefourfive" in my native language has excellent (>100 bit) quality estimate (ridiculous). Maybe zxcvbn one day will add my language to their dictionaries and this will become less ridiculous. However I think that no manually generated or edited password should have excellent message only with no other warnings.

For automatically generated passwords (like within "Tools -> Password Generator") both estimates could be used and the fundamental question is:

Currently KepassXC supports only basic password generation methods (passwords from standard character sets and passphrases based on English words) for which zxcvbn gives decent "cracking entropy" estimates; so this problem is not very important right now (as already mentioned). However if you wanted to start supporting more esoteric password generation methods using things like custom dictionaries this again would become problematic and you would probably want to start using "generated entropy" for more realistic password quality estimates.

For an automatically generated password "cracking entropy" estimate is still useful and can be displayed alongside "generated entropy" based estimate, but I would agree with OP that displaying "generated entropy" as the main source makes more sense.

One could consider adding some additional fields for each password to support better password quality overview (one can store this in notes already):

SebiderSushi commented 4 years ago

Hello @phoerious, this issue has now been laid to rest a long time ago, but i'd still want to know how the entropy of your example would be 8 -log2(1/8). To my understanding, this describes a password of 8 letters, each of wich occuring with a propability of 1/8. Given the set [a-zA-Z0-9] and a password of 8 letters, then every single letter would be assigned to a given position in the passwort with a probability of 1/62, as the set contains 62 letters and a password generator should randomly choose from the set, which should give every letter the same probability to be chosen. To my understanding, this would actually make up for an entropy of 8 -log2(1/62).

What have i misunderstood about entropy?

phoerious commented 4 years ago

That was my main point. Yes, the generation source has an entropy of 8 * -log2(1/64). However, the final password has only 8 different characters which appear with a probability of 1/8 each. So even if your source is very rich, you can still generate a weak password that by some chance only uses a very predictable subset of characters. "abcdeFGH" is one such weak password and so are "aaaaaaaa" and "password". The chances of generating them are low, but if you get unlucky and do generate one of them, it is easy to crack. In the end, there is no distinction between a bad password chosen wilfully and a bad password chosen by astronomically unlucky chance. That's why we measure the strength of the generated password and not its generation source.

SebiderSushi commented 4 years ago

Then i think that this is a problem of notion or my understanding of entropy? (i'm only just starting to grasp the concept of a cracking entropy. As for your example, i currently think that the probability that a certain symbol combination is used as a could be used to calculate cracking entropy.)

When i hear entropy, i think of it defined as generation source entropy, a measurement for the expected number of attempts an attacker blindly brute-forcing your passcode would require to guess it. As far as i understand, anything else is far more complicated to determine and also very speculative. Who is guessing my password? What do they know? What are they thinking? Also, bad basswords only exist because many people chose non-random passwords - if every password was generated randomly, then "abcdefgh" would be just as unlikely to be guessed as "ndoczsod". What makes "abcdefgh" a bad password (being one of the first an attacker is likely to try) is out of scope of the definition of entropy as far as i understand. I think it's confusing if you call it entropy instead of "estimated password strength" or similar, as i've seen entropy defined as the generation source entropy.

I now understand that my previous points were mainly about me not interpreting the entropy in KeePassXC as cracking entropy, BUT KeePassXC uses the generation source entropy for passphrases, even if a common nursery rhyme might be the passphrase. For me, this inconsistency increases the cunfusion as to why "entropy" should mean something else for passwords.

phoerious commented 4 years ago

There is no such thing as "cracking entropy". Your understanding of entropy is largely correct, but not quite applicable, because pure entropy is actually a pretty bad measure for password quality. Our password generation source has a specific entropy, yes, but the password that it generates bears no information about this source. On the other hand, you can enter any password of your own into the password field or edit a generated password to your liking and we wouldn't have any information about the entropy of that source either. All we see is the password and we can only make assumptions about what random source (if any) may have generated it. That assumed source is the one we calculate the entropy of. Of course, the concept of a bad password has nothing to do with entropy per se (except that low entropy causes bad passwords), but we use such information to "correct" the measurement. In the end, we do not try to give a mathematical description of your generation source (which is entirely irrelevant), but try to find a lower bound for how strong the actual password is. What we measure is a lot more conservative and more in line with (although not quite the same as) min-entropy as compared to the classic Shannon entropy.

SebiderSushi commented 4 years ago

Now i'm even more confused as to why you don't want to give it another name or implement these clarifications in a way visible to the user.

phoerious commented 4 years ago

It is effectively still entropy, just not the plain Shannon entropy of an imaginary "perfect" source. We assume a different source instead, one that works more like a language model, thus we take into account a dictionary and character dependencies to find a better estimate of the actual information content than a naive calculation under the assumption of stochastic independence could (an assumption that almost never holds in the real world BTW). Call it corrected entropy if you will, but in the end the name really doesn't matter.

SebiderSushi commented 4 years ago

ok, going to let this issue rest for good then. My final remarks: I think the name and details about the scoring matter for users who want to know what to make of it. Since i know now, i'll move along but don't think current situation not optimal for any other user who doesn't know but wants to. I think your explanations about "What is displayed here?" could be broken down to a few sentences and put into a tooltip or similar in the UI where the Entropy score is displayed. Other than that, thank you very much for taking your time and explaining everything here for me, i appreciate it!