lakiw / pcfg_cracker

Probabilistic Context Free Grammar (PCFG) password guess generator
314 stars 68 forks source link

How to calculate probabilities #8

Closed iohehe closed 5 years ago

iohehe commented 6 years ago

How to calculate probabilities,and use it as a password meter , thank you

m33x commented 6 years ago

It's described in:

fuzzyPSM: A New Password Strength Meter Using Fuzzy Probabilistic Context-Free Grammars Wang et al., DSN '16: http://wangdingg.weebly.com/uploads/2/0/3/6/20366987/dsn16v9.pdf

and in

Building Better Passwords using Probabilistic Techniques Houshmand and Aggarwal, ACSAC '12: http://sci-hub.hk/https://dl.acm.org/citation.cfm?id=2420966

But, you might also want to read

Design and Evaluation of a Data-Driven Password Meter Ur et al., CHI '17: https://www.ece.cmu.edu/~lbauer/papers/2017/chi2017-pwd-meter-official.pdf

lakiw commented 6 years ago

Thanks m33x for posting those links!

Iohehe, I appreciate the interest in using PCFGs for creating better password meters. Unfortunately, the code in this github repo doesn't directly support that functionality, and I'm not aware of any other public implementation of PCFG password meters either. I'd recommend contacting Wang, Aggarwal, and/or Houshmand directly to see if you could obtain a copy of their code.

The code here will allow you to train a PCFG on existing password lists, as well as generate password guesses in probability order. The problem is it currently doesn't support inputting a user's password and then returning the probability of that password according the previously trained grammar. There's a lot of gotchas inherent in that (multiple ways to derive the same password, dealing with passwords that aren't directly mapped by the grammar, etc), that are tackled in the Houshmand paper above.

If you do decide to code that functionality yourself feel free to use any of the code here. I'm also more then willing to answer any questions to the best of my ability.

I'll leave this issue open for now to support this conversation, so if anyone else has similar questions/requests/comments please let me know. I'll admit adding this functionality myself is low on my priority list. I'd rather focus on improving the grammar itself, but if there is a lot interest I can start to look into incorporating the Houshmand paper into this repo.

iohehe commented 6 years ago

Thank you 👍

lakiw commented 5 years ago

I know it only took literally a year to get back to you on this, but in the new Version 4.0 rewrite there will be a file called password_scorer.py. You can currently preview it in the branch https://github.com/lakiw/pcfg_cracker/tree/Version4

What it does is take as input a list of potential passwords, and returns scoring information for them. This includes if it thinks it is a 'password', 'website', 'e-mail address', or 'other', along with the OMEN level required to generate the password and the PCFG probability if it can construct a parse tree for the password. It's still missing a few features to get to "This is how many guesses are expected to be required to crack this password", but it's certainly getting closer to something that could be used in a password scorer.

Note: This feature was actually created to help identify plaintext password dumps and try estimate if they looked legit or not. It can also be used to clean up input wordlists, or estimate the max effectiveness of a PCFG cracking session. Aka it's a multi-use tool but one that would still require a lot of research, development and testing before I'd be comfortable using it in a production environment as part of a password creation policy.

lakiw commented 5 years ago

With the release of v4.0 and the password_scorer.py being included in the master branch, I'm going to consider this issue closed for now. There still is a lot of work that needs to be done to turn that into an effective password meter. Not sure if that should be started up as a different issue though. If you think differently feel free to re-open this issue or create a new one with more targeted list of requirements.