lakiw / pcfg_cracker

Probabilistic Context Free Grammar (PCFG) password guess generator
314 stars 68 forks source link

CPU or GPU? What License? #5

Closed bernardoaraujor closed 6 years ago

bernardoaraujor commented 7 years ago

Hello. My name is Bernardo Rodrigues, I am a Brazilian MSc student (bernardoaraujor@gmail.com). You can check a bit of my work at https://goo.gl/CJcNBt and https://goo.gl/jSyLTS. I want to use PCFG as a reference for my MSc Thesis.

I'm somewhat lost in the documentation though (there seems to be nothing at python_pcfg_cracker/Documents).

Regards, Bernardo.

m33x commented 7 years ago

Hi Bernardo,

In it's current version PCFG is meant to run on CPUs only.

Regarding the license I can't help you, you need to wait for Matt's response.

Be aware there are multiple implimentations of PCFG-based password cracking:

https://github.com/lakiw/pcfg_cracker https://sites.google.com/site/reusablesec/Home/password-cracking-tools/probablistic_cracker https://github.com/cupslab/guess-calculator-framework

Regarding your reserach you should check out

Cheers, Maximilian

lakiw commented 7 years ago

Hi Bernardo, Thanks for your interest! The license of the PCFG implementation in this repo is GPL2. Same goes for the earlier version of this program at https://sites.google.com/site/reusablesec/Home/password-cracking-tools/probablistic_cracker There are other PCFG implementations by other groups and they are under their own licenses. For example the CMU team wrote their own version from the ground up.

Things get a bit tricky on the commercial usage of PCFGs for password policy enforcement, (for example usages like zxcvb to make stronger passwords) due to the fact that Florida State University, (not me), owns a lot of the original research. In that case I recommend contacting Dr. Sudhir Aggarwal at sudhir@cs.fsu.edu.

As Maximilian already pointed out, this code here currently only supports generating guesses on CPUs, (I use a ton of memory and don't distribute the work much). That being said, depending on the hash type the back-end cracker can absolutely run on GPUs. Aka you can pipe the results of pcfg_manager.py into GPU hashcat. In most cases the PCFG generates guesses way too slow to make full use of the back-end GPU, but there are some situations, such when attacking a large number of salted hashes, or really slow file encryption, it works quite well.

Also, the basic idea of using PCFGs should be able to be applied in GPUs if you are willing to use a threshold based approach to generating guesses vs. a probability order approach. Aka Hashcat's Markov mode which can run on GPUs is drastically different than CPU Markov implementations like OMEN and JTR's Incremental mode, (though Maximilian can correct me on the OMEN comment ;p), I've yet to get beyond the drawing on bar napkins stage of development of that though.

I apologize for the lack of documentation. That's also on my todo list but in all honesty I doubt I'll be getting around to that anytime soon. If you have any more questions though feel free to ask!

Also, on my todo list is to merge the "Development and Testing" branch back into "Master". There's a ton of improvements in that which so far seem to make it significantly better. If you're feeling adventurous I'd recommend going off that branch instead. If you run into any problems please let me know since that'll really help me as well.

Thanks again and good luck!

bernardoaraujor commented 6 years ago

Thanks for the response guys!

@lakiw, isn't it possible to generate the guesses into a file (with sorted probabilities), and then use it as a wordlist with hashcat? I mean, guesses don't really need to be generated on the fly with the cracking, do they?

lakiw commented 6 years ago

pcfg_manager.py outputs all guesses to stdout, (Side note: it also outputs status/error messages to stderr so that way they aren't passed to the backend cracker). Therefore you can pipe those guesses one time to a file to create an input dictionary. I don't have a "only generate X number of guesses" option though so it's up to you to limit the number of guesses that actually gets saved, (I could add that as a feature if you want but the guess generation is already way too slow so I'm hesitant to add extra checks there).

lakiw commented 6 years ago

Just to clarify, by default pcfg_manager.py currently outputs all guesses in probability order to stdout. I may add a faster option in the future to generate guesses out of order using a threshold much like JtR's Markov mode currently does but that's likely a long way off. I'm also tempted to use some of the techniques OMEN currently uses now that I'm studying up on it. On yet another side tangent, the older C++ version of the pcfg_manager in the archived_work folder actually had some support to save sorted pre-terminals to a file. Depending on the grammar that could drastically reduce the size of the dictionary file while still saving all the work precomputing the probability order.

lakiw commented 6 years ago

Closing this issue now since it seems like the original question about the license has been answered. If you have any other questions feel free to keep asking them by opening another issue.