openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
9.92k stars 2.06k forks source link

Splitting brute force keyspace #4576

Open sectroyer opened 3 years ago

sectroyer commented 3 years ago

I have small feature request (unless it's already implemented?). I use both john and hashcat in custom solution for distributing password cracking. I have pretty good results (all nodes finish with similar ETA) by splitting dictionary by number of lines (as a percentage of nodes computational power). However I have an issue while splitting keyspace in brute attack. Itried using simple letters (for example one node a-k and other l-z) but it's pretty hard to implement and often unreliable. I noticed hashcat has some options for splitting the keyspace itself. It would be nice to have something like that in john. Relevant link: https://hashcat.net/forum/thread-3386.html

sectroyer commented 3 years ago

@sectroyer I don't understand what you mean by this:

For your example 100000 is the limit you can't handle 10x nodes or anything like that.

Please provide an example of what you'd like to do but you think you cannot. I continue to think you misunderstood me.

Okay instead of using 1000000 as big number let's say our "big number" is 10 :) Easier to explain that way. I can divide 10 between 5 nodes, 10 nodes but can't do much if there is 20 nodes. Also it's hard to decide on good "starting number" if you don't have something like --keyspace. Also I think same params are used for dictionary based attacks. I will try to summarise it once again there are edge cases which you can do with direct control of distributed data (for example with something like --skip/--limit) which you can't do with --node. However even with stuff that you CAN do with both approaches (maybe it's even 99% of use cases) they are much harder to handle for distributed when it looses the control over data. If I want to simply schedule task among static set of nodes then I would prefer --node. Especially when it can handle character increments, different modes etc. However when I am doing longer tasks in an environment where nodes come and go it's MUCH for distributed system have direct control over data.

No, regardless what we implement you will never be able to define a work piece such that it will be the actual same bunch of candidates between john and hashcat, that's for sure. We do things differently on every level. So hashtopus would definitely need to define a task either john-only or hashcat-only. It still can be done by splitting by hashes or as I do in my system by value of single character.

Now once that fact is established, perhaps your "need" for skip/limit fades significantly? They'll be john-specific tasks anyway, so could simply use our existing node syntax for now and be done with it.

Well yes and no. We still lack something like --keyspace. Also do I understand correctly that --node=x-y/z means split keyspace into z parts and give machine parts from x-z and are --node=1-10/100 and --node=1-20/200 scheduling same keyspace to machine? In other words can I "control" what keyspace machine gets scheduled by controlling x,y,z? In other words can is there a formula that would allow calculate "skip/limit" values used internally and use x,y,z to set those internal values indirectly? Also I think hashtopolis uses --skip/--limit for splitting wordlist attacks but didn't do much tests with that yet. I still think it's best to simply split the dictionary itself tough it's possible hashtopolis (or maybe even php?) wouldn't be able to handle splitting many gigs anyway so it's best to stick with what they are already using.

However, some time in the future (definitely post next release) I think I'll experiment with implementing keyspace/skip/limit options just for the sake of not doing any round robin.

It would definitely make integration much easier.

You sort of can do that by splitting the mask by a character, e.g. have hashcat test passwords starting with [a-i] and john [j-z]. Then it doesn't matter that the tools test the rest of the mask in different order and different blocks.

Yeah, exactly what I think is the only logical way of dividing those. Definitely something worth doing in the future however I think it would require major changes in hashtopolis itself as it doesn't have any concept of characters. It simply blindly passes masks as command arguments to agents.

I think fastest way to have "something working" is to use what hashtopolis already has - cpu only tasks/agents. From my experience there is no point in using hashcat for hashing on CPU. It has no concept of avx and it's usually 2-4 times slower than john. Especially on multi core systems. Therefore I think I would simply fork hashtopolis and rename "cpu only" to "john only" 😄 Agent itself will require more work but still shouldn't take long to get john working with hashtopolis 😄

magnumripper commented 3 years ago

Well yes and no. We still lack something like --keyspace

For what really? Just split the keyspace in whatever number of slices you feel like, and serve them. OK maybe you want a job to run for some fairly controlled and/or short amount of time? That should be doable anyway. Launch a tiny job and watch how fast it completes. If it was 5x too fast, launch a 5x larger one next time.

Also do I understand correctly that --node=x-y/z means split keyspace into z parts and give machine parts from x-z and are --node=1-10/100 and --node=1-20/200 scheduling same keyspace to machine? In other words can I "control" what keyspace machine gets scheduled by controlling x,y,z? In other words can is there a formula that would allow calculate "skip/limit" values used internally and use x,y,z to set those internal values indirectly?

I'm pretty sure that if you started by giving away --node=1-10/100 you can later decide to do --node=21-39/200 for next batch and there would be no loss or discrepancy. But don't quote me on that just yet 😆

sectroyer commented 3 years ago

I'm pretty sure that if you started by giving away --node=1-10/100 you can later decide to do --node=21-39/200 for next batch and there would be no loss or discrepancy. But don't quote me on that just yet 😆

If that's the case then I guess there should be some x,y,z to --skip/--limit formula that I could implement in an agent. Or maybe john itself could do such formula?

Well yes and no. We still lack something like --keyspace

For what really? Just split the keyspace in whatever number of slices you feel like, and serve them.

When I was researching this topic I stumbled on 2 hours youtube vide about hashtopolis. It was kinda q&a with authors and there he was explaining (I am "quoting" from memory, so not 100% sure about exact wording) that they could "implement this" or "implement that" but there are some fundamental assumptions that they did in the past that they are now stuck with. For example "we are stuck with PHP now" or something like that. Anyway paraphrasing if we don't want to rewrite big chunks of hashtopolis itself we are kinda stuck with how it handles keyspaces etc. From what I already checked whole --skip/--limit concept is very heavily used everywhere and I suspect there is even more places that I didn't look at 😄

OK maybe you want a job to run for some fairly controlled and/or short amount of time? That should be doable anyway. Launch a tiny job and watch how fast it completes. If it was 5x too fast, launch a 5x larger one next time.

Also as I explained previously this approach has a downside of not being able to treat the data as data. Distributed system (with dynamic nodes etc.) can't "think" in terms of data batches themselves but rather try to translate them into --node values. It adds another layer of abstraction which makes it harder for such systems. However as we discussed previously it's much easier to use in "static" systems, where you simply want to split the job. Also ONLY --node can be used with stuff like your --increment mode or keyspaces with dynamic length (increasing number of characters). In hashtopolis (as it's a distributed system using "second approach") you can kinda do that. You have create a task for each length, then run them. You can try joining them in a supertask or something like that but those ways are "hacky". This is a situation where --node suits best. You simply set nodes power and schedule a task. Not worrying about data itself (which because of different number of characters constantly changes). This also works in the other way around. It's not that you can't use --node for a lot of stuff we are discussing here but those approaches are also kinda "hacky" 😄 In the end I think it will be much easier and consistent to "express" node values (internally by john or in agent) as options that behave as --skip/--limit.

For what really? Just split the keyspace in whatever number of slices you feel like, and serve them.

Also it's not so simple when nodes come and go. I split for ten nodes and user connects three more. I have to either recompute values to new node number or do what @solardiz susggest and use huge number of nodes. Which brings us to previous issue of what is a "good big number of nodes". Also how can I compare two keyspaces if I want to compute an eta for a set of tasks (hashtopolis calls them supertasks) that has a lot of task inside? Run a "benchmark" for each of those? Now I can simply get a keyspace for each of tasks and within seconds add them together and get the size of keyspace for whole set of tasks (supertask). The more I think about it the more issues come to my mind and probably even more will arise if we try to implement it. All of those issues (and more to come) are present because someday it was decided (intentionally or not) that hashtopolis will be a distributed system using "second approach" ergo centred around data itself...

solardiz commented 3 years ago

Okay instead of using 1000000 as big number let's say our "big number" is 10 :) Easier to explain that way. I can divide 10 between 5 nodes, 10 nodes but can't do much if there is 20 nodes.

@sectroyer Your logic failed to apply to what I had suggested right where you chose 10 as the "big number". You really need to choose a number big enough that you'd never have more physical nodes (or processes) than that.

In my suggestion, you wouldn't be dividing the big number space between nodes. You would treat those virtual node numbers as pieces of work, to be handed out to whatever nodes are currently online.

sectroyer commented 3 years ago

@sectroyer Your logic failed to apply to what I had suggested right where you chose 10 as the "big number". You really need to choose a number big enough that you'd never have more physical nodes (or processes) than that.

I used 10 only as an exaggeration to explain what I mean 😄 Still the question remains what is a "number big enough" if we don't have --keyspace. We could use hardcoded size of keyspace (for example 10^9) and it might kinda work but would break sets of tasks (supertasks) as each task would have same "keyspace". Also I remember you warning me about not using too big values for --node so what is the "upper limit" that we can safely use ?

In my suggestion, you wouldn't be dividing the big number space between nodes. You would treat those virtual node numbers as pieces of work, to be handed out to whatever nodes are currently online.

Yes I understand but still we don't have data about size of keyspace itself as well as it would have to be implemented on agent side as hashtopolis depends too much on whole concept of keyspace size. One more thing can I use --node=start-stop/number_of_lines_in_dict for dictionary attacks? I remember also reading something about hashtopolis splitting rules for dictionary attacks but didn't analyse that part of source code yet. Anyway as explained is not that your idea can't work at all but simply (IMHO) doesn't fit what hashtopolis is already using and we will encounter more and more issues with it.

If we decide to not add new options to john itself we have to create some layer of abstraction that would translate --node values into --skip\limit at the agent level. Such approach might be doable but still we have consider all corner cases. Tough I still think we need at least --keyspace to decide on a big enough node number for each task. Without that we can't even compare tasks themselves.

sectroyer commented 3 years ago

I was looking for a good place to print keyspace size (as a temporary solution) and trying to implement something based on what is already in john. I notice you update mask_tot_cand multiple times but is do_mask_crack a good place to print this value here:

            mask_tot_cand = cand * mask_int_cand.num_int_cand;

            /* Update internal masks if needed. */

I also noticed this comment:

/*
 * When iterating over lengths, The progress shows percent cracked of all
 * lengths up to and including the current one, while the ETA shows the
 * estimated time for current length to finish.
 */
static double get_progress(void)
{
    double total;

So does this mean that in eta calculation mask_tot_cand is more "up-to-date"?

solardiz commented 3 years ago

@sectroyer Just what problem are you trying to solve? Yes, we did discuss that too large total virtual node numbers are problematic for some modes, but why don't you just give the approach I had suggested a try for mask mode? I think this is the first thing to do, and there might not be any code changes needed to get it to meet your needs. Just use, say, 1 million virtual node numbers for your keyspace portions.

sectroyer commented 3 years ago

Well main issue is this piece of code:

    def measure_keyspace(self, task, chunk):
        if 'usePrince' in task.get_task() and task.get_task()['usePrince']:
            return self.prince_keyspace(task.get_task(), chunk)
        elif 'usePreprocessor' in task.get_task() and task.get_task()['usePreprocessor']:
            return self.preprocessor_keyspace(task, chunk)
        task = task.get_task()  # TODO: refactor this to be better code
        full_cmd = self.callPath + " --keyspace --quiet " + update_files(task['attackcmd']).replace(task['hashlistAlias'] + " ", "") + ' ' + task['cmdpars']

From this file hashcat_cracker.py.

I can hardcode it to always return "1 million" but that won't solve the issue :) My point is not to show that it can be done. I already analysed enough of hashtopolis code to see that it IS doable. I just want to do it in a generic way. Even If I switch to using --node I still have to have a way to "compare" tasks etc. So getting the keyspace size is bare minimum:) Without that both ?l?l and ?a?a?a?a?a?a?a will have same "keyspace size" which doesn't make any sense :) If I provide hashtopolis with "correct" (at least in terms of scale) number I can then translate --skip/--limit values it passes to agent into corresponding --node values. At least that is my current idea 😄

solardiz commented 3 years ago

Why don't you hard-code the virtual total keyspace size as 1 million? As long as you run a distributed attack on just one mask at a time, it shouldn't matter that this 1 million corresponds to very different actual keyspace sizes varying between masks.

sectroyer commented 3 years ago

I breaks many things, for example I loose the ability to (even remotely) correctly calculate eta/percentage of set of tasks. I cannot compare tasks anymore as all tasks look the same. Also what If hashtopolis divides the task even more (to 2 million) as @magnumripper said it can be done. At that point (previous) node values stop working and I need to adjust them on the basis of new --limit/--skip values I will receive. I think you are not taking into consideration that even If I switch to --node at agent level hashtopolis will still communicate using --skip/--limit values and they have to follow some logic or everything will start breaking. Those are ONLY the issues I know exist how many are left no idea. Instead of fighting with windmills I want to at least stick to already defined protocol.

solardiz commented 3 years ago

I loose the ability to (even remotely) correctly calculate eta/percentage of set of tasks.

Surely you can calculate the percentage easily: it's the number of virtual nodes (actually sub-tasks) already completed vs. the pre-specified total (say, 1 million). You can calculate the expected total duration by dividing the time already spent by the ratio of work already done (e.g., 1 hour / (1000 / 1000000) gives you 1000 hours total, 999 hours left).

Also what If hashtopolis divides the task even more (to 2 million)

I don't know about that, but either you choose a virtual node count that's always high enough to satisfy hashtopolis, or you let it choose one and use that as the virtual node count. Perhaps a slight code change, if necessary, would prevent it from choosing numbers that are way too high (more than millions).

I think you are not taking into consideration that even If I switch to --node at agent level hashtopolis will still communicate using --skip/--limit values and they have to follow some logic or everything will start breaking.

You just put those values into the start-end fields of --node.

sectroyer commented 3 years ago

Surely you can calculate the percentage easily: it's the number of virtual nodes (actually sub-tasks) already completed vs. the pre-specified total (say, 1 million). You can calculate the expected total duration by dividing the time already spent by the ratio of work already done (e.g., 1 hour / (1000 / 1000000) gives you 1000 hours total, 999 hours left).

Not between task with different keyspace size (for example ?l?l and 8*a?). I cannot calculate eta/percentage using your formula

You just put those values into the start-end fields of --node.

I wanted to scale it but looks even this won't work :(

I don't know about that, but either you choose a virtual node count that's always high enough to satisfy hashtopolis, or you let it choose one and use that as the virtual node count. Perhaps a slight code change, if necessary, would prevent it from choosing numbers that are way too high (more than millions).

I found great article Fitcrack that compares hashtopolis and fitcrack (didn't know it even exists). It's system that uses same concept for achieving better results. It's all based on dynamic changes of chunk size for keyspace.

Anyway this paper convinced me that --node wont' work at all. Let me explain why. Let say we have two nodes: A and B. On node A there is fast GPU whereas on node B some cpu/slow gpu. Anyway difference of power between them is 100 times. Hashtopolis does something like this: A->getKeyspaceSize() B->getKeyspaceSize() (possible it does it only on one of them, not sure about that but point stays. A nad B both return 1million (let's call it 1000 for easy explanation).

Next hashtopolis (and fitcrack) use fixed chunk size expressed in seconds. They run benchmark on both A & B and split the keyspace (our 1000) in such a way that it represents percentage that will be computed in 600 seconds (sometimes this can be even 3600 from what I read). For node A this might be 10 chunks whereas for node B even 1 might be too slow. And here is a question with keyspaces of such big ranges and sizes what virtual node count would be good value for heterogeneous nodes of such diversity? 10ml, 100mln? Can we even have such granulity of let's say --node 1-100/10mln for B and --node=10k-20k/10mln? If we would like (and we should) to support big diversity of nodes the node number must be big not sure if 10mln or 100mln would be enough... What limits us (except of internal john's --node limitations) is power of single node (--node value for it can't be lower than 1), computing power of fastest node and number of nodes. Even in my small setup 100 times computing power difference isn't anything special. However if I compare slowest node in my current setup with multi-gpu node I wanted to book to do some tests I think the difference was 1k or even 10k (not sure about exact numbers). The more I try to "design" it the more issues arise... What would final "max difference" that we allow be... For sure there are slower and probably even faster nodes then the ones I tested...

solardiz commented 3 years ago

Can we even have such granulity of let's say --node 1-100/10mln for B and --node=10k-20k/10mln

Yes, we probably can. Please just try.

For this specific example, though, why not just scale both down by a factor of 100? For the slowest physical node, allocate tasks that are of 1 virtual node in size. For the 100x faster physical node, allocate them in size of 100 virtual nodes. Yes, this won't let you specify 1.5 virtual nodes for a node that's 1.5x the speed of the slowest, but there should be no need to target a single chunk size so precisely - it doesn't really matter if it's, say, 600 or 400 seconds. The controller would simply end up allocating 1.5x more tasks to the node completing them in 400 seconds. It would do so without any advance calculation - this would happen on its own if each time it allocates the next task to whatever node becomes idle.

sectroyer commented 3 years ago

Can we even have such granulity of let's say --node 1-100/10mln for B and --node=10k-20k/10mln

Yes, we probably can. Please just try.

Good. For now "designing" :) But slowly pieces come together.

For this specific example, though, why not just scale both down by a factor of 100? For the slowest physical node, allocate tasks that are of 1 virtual node in size. For the 100x faster physical node, allocate them in size of 100 virtual nodes. Yes, this won't let you specify 1.5 virtual nodes for a node that's 1.5x the speed of the slowest, but there should be no need to target a single chunk size so precisely - it doesn't really matter if it's, say, 600 or 400 seconds. The controller would simply end up allocating 1.5x more tasks to the node completing them in 400 seconds. It would do so without any advance calculation - this would happen on its own if each time it allocates the next task to whatever node becomes idle.

Keyspace size is decided at agents level. Also both A & B don't know anything about each other. We need to use node number that would allow for slowest possible and fastest possible node. Once a node is asked for keyspace size it has no concept of being slow or fast. That's why something generic (for example actual keyspace size) has to be used.