openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
9.68k stars 2.05k forks source link

Splitting brute force keyspace #4576

Open sectroyer opened 3 years ago

sectroyer commented 3 years ago

I have small feature request (unless it's already implemented?). I use both john and hashcat in custom solution for distributing password cracking. I have pretty good results (all nodes finish with similar ETA) by splitting dictionary by number of lines (as a percentage of nodes computational power). However I have an issue while splitting keyspace in brute attack. Itried using simple letters (for example one node a-k and other l-z) but it's pretty hard to implement and often unreliable. I noticed hashcat has some options for splitting the keyspace itself. It would be nice to have something like that in john. Relevant link: https://hashcat.net/forum/thread-3386.html

solardiz commented 3 years ago

@sectroyer We have the --node option, usable with any cracking mode (although under the hood it's implemented with separate code per cracking mode). If your nodes aren't the same speed, you can use a higher virtual node count and node number ranges. Please refer to doc/OPTIONS. Does this satisfy your needs?

solardiz commented 3 years ago

Anyhow, there isn't anything in what you wrote that we'd currently want to track as an issue. So I'll close this. Please feel free to add further comments anyway, or to move this to the john-users mailing list (for a discussion on how to use the existing functionality best).

sectroyer commented 3 years ago

I checked --node option and it is usable for what I want to use it. It's even easier to use than manually calculating keyspace number like in hashcat. For now I was calculating node "percentage power" which (if I understand correctly) I could use with nodes set to 100 and treat percentages I have calculated as ranges you are describing. This would solve my issue with john nodes however for some of computation I want to use hashcat nodes and from what I can see they use different approach. Unless there is a way to "translate" john node option to hashcat values ? Then I could use john to split the work even with those nodes. I understand you have your ways and to be honest I prefer this node option from hashcat ones as I don't have to calculate number jsut give john "the percentage" and voila. However I also noticed you try to have at least some level of compatibility (like enabling hashcat mode for rules) with hashcat so making small changes to enable such compatibility might help :) Just as a side note I know some people would also appreciate similar options for dictionary based attack. In my setup personally wouldn't use them anyway as I split the work by splitting dict into chunks with line numbers corresponding to computational power of the node. However for example I heard that one of the reasons hashtopolis isn't adding support for john nodes is lack of those options: -s, --skip | Num | Skip X words from the start | -s 1000000 -l, --limit | Num | Limit X words from the start + skipped words | -l 1000000

I suspect implementing similar options (or in case of brute outputting "hashcat compatible values") would help many people especially since if I understand correctly part of those things (like skipping dict lines via restore file or node option for brute) you already support :)

magnumripper commented 3 years ago

It should be silly simple to implement JtR in Hashtopus (much simpler than hashcat), they are just too blinded with hashcat to see what course to take. I actually started to implement an alternative --node option for hashcat some time ago, because I despise their current options, but it never ended up in a PR. It should be fairly easy to implement though.

magnumripper commented 3 years ago

Then again, our --node option has its limits, it's really not meant for fine-coursed stuff like -node=1355/1000000. It depends on what mode is used though.

sectroyer commented 3 years ago

Well I only hope to use it as -node=1-13/100, -node=14-20/100, etc. As "percentage power" so I don't think there will be an issue with that. Also as I explained I have very good results by splitting dictionary in a similar matter however still have issues with brute:( If I could use john to do -node=83-100/100 and use --stdout (or some other flag) to output "hashcat numbers" I could support all nodes using "john semantic". I understand you prefer your way of splitting task and I can agree with your decision as doing keyspace calculations for every task is pointless. Of course I am doing similar thing in dictionary based attack but at least I don't have to send whole (many gig) dictionary to every node (if I understand correctly that's what hastopolis does and uses -s and -l to specify lines within whole dictionary). Splitting tasks by power between nodes is much more "user friendly" and now I think in my system I will switch to node tough I will have to still split keyspace "by letters" to john nodes (and use percentage power) and split second part of keyspace using haschat "magical numbers". Better than I was having till now but if you don't want to support "hashcat options" (which I can understand) it would be nice to "spit out" those numbers for specific use cases. You have a way of printing generated dicts by rules and brute force which can be used by any node anyway so if it's possible(?) to add "--spit-out-hashcat-numbers" it would make my life easier 😄 Unless their keyspace numbers are NOT compatible with how you currently handle splitting by node parameter?

solardiz commented 3 years ago

Unless their keyspace numbers are NOT compatible with how you currently handle splitting by node parameter?

They almost certainly are not.

BTW, you mention "brute force" in multiple places, but to me this is something undefined, a non-term I refrain from using. We have several cracking modes. hashcat has several attack modes. In a sense, all of those (even wordlist-based modes) are "brute force": they test candidate passwords. hashcat has a mode called "brute force", which is actually mask mode in our terms. hashcat's variation of this mode tries to be smart by default by using Markov chains, and hashcat has an option to dumb it down (but you shouldn't) to what its help message calls "classic brute-force" (but I wouldn't).

magnumripper commented 3 years ago

Better than I was having till now but if you don't want to support "hashcat options" (which I can understand) it would be nice to "spit out" those numbers for specific use cases. You have a way of printing generated dicts by rules and brute force which can be used by any node anyway so if it's possible(?) to add "--spit-out-hashcat-numbers" it would make my life easier 😄 Unless their keyspace numbers are NOT compatible with how you currently handle splitting by node parameter?

AFAIK and if things hasn't changed, hashcat keyspace numbers are not often counted in "passwords" but in "blocks" of unknown and varying size. You can get a keyspace figure from hashcat of eg. 363,839 for an actual keyspace of 1,117,714,499 becuase the keyspace was counted in 3K blocks in that very occasion. Running the very same keyspace calculation on some other rig (eg. with other GPU's) could give you some other keyspace figure because the block size is larger or smaller (and I think could depend on autotune or other things). This all is quite natural and without it being this way you couldn't even handle really huge keyspaces but as a result, there's no way an external tool can spit out hashcat keyspace numbers. Or did I misunderstand what you asked for? Also, again please note that I may be wrong and/or things might have changed in hashcat.

sectroyer commented 3 years ago

@solardiz yes I meant what (in john terms) is called mask mode. I just got used to calling it "a brute" for short ;) @magnumripper yes, that is exactly what I was asking for. Tough shame there is no "standard" for it. I played around on few machines that are part of my test environment and I definitely see benefits of mixing node types. In my test environment around half (depending on algo) of hashing power comes from cpus and john perform significantly better than hc on those. I also was surprised that for dictionary based attack (usualy with rule) john outperforms hc 2-3x however for mask based attack I still get better results(especially with multi gpu setup) with hc. Such results encourage me to try to "combine" all of them into single "system". I want to think about it from basics and to not have any wrong assumptions. Also for me the more heterogenous system is the more interesting it gets 😄

sectroyer commented 3 years ago

Okay I was able to implement what we discussed and it looks to be working. However on fastest john node I don't see ETA and percentage progress after few first chars (I am doing mask attack 1-8) disappears. Am I doing something wrong or is it a bug? Here is how I split (first by first letter, than by --node):

Total power percentage of john nodes: 41.11% Total john keyspace: w-9

john1 power: 7.57% node: 1-757/4111 john2 power: 7.15% node: 758-1472/4111 john3 power: 26.39% node: 1473-4111/4111

Log looks like this: 0g 0:00:00:00 0.00% (5) (ETA: 10:26:59) 0g/s 13107Kp/s 65536Kc/s 65536KC/s zjxX..0syWh 0g 0:00:00:00 0.00% (6) (ETA: 10:26:59) 0g/s 24296Kp/s 121481Kc/s 121481KC/s xsmG7..1u5Ewh 0g 0:00:00:05 (6) 0g/s 21479Kp/s 107395Kc/s 107395KC/s 7m7K2j..zy0Q2j 0g 0:00:00:15 (7) 0g/s 23288Kp/s 116443Kc/s 116443KC/s 1jvW77..4erFiwh 0g 0:00:00:43 (7) 0g/s 24041Kp/s 120205Kc/s 120205KC/s 896Uf2u..041Zf2u 0g 0:00:01:21 (7) 0g/s 24170Kp/s 120869Kc/s 120869KC/s 5emGsbk..xneAtbk 0g 0:00:01:59 (7) 0g/s 24257Kp/s 121288Kc/s 121288KC/s zbgMoip..4gmKoip

solardiz commented 3 years ago

@sectroyer The disappearing progress percentage and ETA could be a bug or at least a shortcoming. If you can share a reduced and reproducible test case for this, then please open a separate issue for it. No real password hashes in the test case, please (you can generate similar fake ones). Thanks!

sectroyer commented 3 years ago

@solardiz I have submitted bug report here: https://github.com/openwall/john/issues/4579 with "test" pw ;) Still not sure if my "fix" doesn't break anything else (in my current setup) will do more tests and report.

BTW I would like to ask something about mask mode. Let's assume (for the sake of simplification ;)) that I want to crack for letter pw which has one upper letter, one digit, one symbol and one lower letter. Simple mask might look like this: ?u?d?s?l However that's only one possible configuration. In reality it will be (if I calculated correctly) 16 different configurations. I would like to specify that I am looking for pw of that characteristic. Or for example an ?l?d 8 character long which (at most) has 2-3 upper letters. I think the only way to do that now is to prepare "16" (like in simplified example) different masks and perform "16" different attacks. This makes not only hard to "maintain" but also hard to do any eta and such. Is there any way to simplify that?

solardiz commented 3 years ago

@sectroyer Thank you for creating that other issue.

There isn't currently a good way to express those masks as one attack.

BTW, for attacks on a small number of descrypt hashes you'll likely find --fork significantly faster than OpenMP. Unfortunately, we currently require the number of forked processes to exactly match the number of nodes in the range. So you can't have e.g. --fork=32 --node=1473-4111/4111. You can have e.g. --fork=32 --node=200-231/333. This is a shortcoming we might want to address, but currently don't plan to.

sectroyer commented 3 years ago

@solardiz Okay I will do test fork option tough with openmp i get one status for whole session which is pretty useful. Will see how big impact will it have in my specific calculations and report back :) Tough since it has such big correlation with --node I don't know if I will be able to adopt it efficiently. Node "john3" has 32 cores so it would require 32 nodes whereas "john2" has 8 cores but cracking speed ~7.15% of total speed. If I would translate it to same number of nodes it would get 6.59% (26.39/4) with "john1" the difference becomes even more significant. Also some nodes (with same number of cores) have avx, some avx2 so I need to adjust the the number of nodes to their cracking speed and not to number of cores themselves... Well will have to check if advantages overcome disadvantages ;) Anyway just to explain I am MUCH MORE interested in adjusting the system itself and evenly distributing tasks (as that's main topic of my research) than cracking specific hashes 😄

sectroyer commented 3 years ago

There isn't currently a good way to express those masks as one attack.

I think providing some way of expressing such "constrains" would be good idea. Especially since for computer "configuring" such attack isn't really complex tax but for humans/scripts it really hard to grasp. In real world scenario with up to let's say 8 characters long pw it's pretty common to have 1-3 numbers, 1-3 special chars and rest we can assume would be lowercase letters and maybe 1-2 uppercase letters. At least those are results I have after analysing some IoT hashes I played around. I am not a mathematician but such keyspace (at first glance) should be significantly smaller than doing full ?l?d?u?s mask attack. But even if my "first glance estimation" is incorrect there are still use cases where it might be useful and much faster than specifying "full mask". Now just to check if one char of 8 characters long password is an uppercase letter I have to run 8 separate sessions :)

solardiz commented 3 years ago

@sectroyer Now we're deep into topics that would be much better discussed on john-users than here.

I am MUCH MORE interested in adjusting the system itself and evenly distributing tasks (as that's main topic of my research)

If so, you could want to experiment with very different hash types, and see different strategies work best accordingly.

some nodes (with same number of cores) have avx, some avx2

For AVX vs. AVX2, you could double the number of forks on the AVX boxes (overbooking them, which would hurt performance really bad with OpenMP but is tolerable with fork). Or halve them on the AVX2 ones, considering those "cores" aren't real (you probably have 2 hardware threads per core). You can try it both ways.

I am not a mathematician but such keyspace (at first glance) should be significantly smaller

See https://openwall.info/wiki/john/policy

just to check if one char of 8 characters long password is an uppercase letter I have to run 8 separate sessions

You can use a revision of [List.External:Policy] instead, but only for much slower hashes or/and higher different salt count.

solardiz commented 3 years ago

At least those are results I have after analysing some IoT hashes I played around.

It's a fallacy to focus on password patterns you see the most without considering the size of keyspaces for those patterns vs. simpler patterns that maybe don't occur so frequently. You do need to be a little bit of a mathematician there to calculate and sort the ratios.

sectroyer commented 3 years ago

@solardiz I considering such "based on patterns" attack when I already did all dicts I could find with best64 rule (btw is there any other rule also worth checking?), full 8 chars of ?l?d and some other stuff. There are still some things I wanna try but such "pattern based" attack is last thing that won't take "ages" :) I just noticed that some of the "harder" to crack hashes on IoT that I could crack were pretty "random" so those "patterns" are only thing I could come up :)

solardiz commented 3 years ago

@sectroyer More john-users material here. Please use the mailing list next time you bring something like this up.

best64 rule (btw is there any other rule also worth checking?)

Of course. --rules=jumbo is a must to run, at least with a tiny wordlist (such as RockYou top 10k to 1M). With a tiny enough wordlist or/and a fast hash, you'd also run --rules=all.

hashes on IoT that I could crack were pretty "random" so those "patterns" are only thing I could come up :)

You can instead use "incremental" mode and let it do the magic for you. It can take forever to complete, but that's actually an advantage: you don't have to start new attacks, you can stop any time you like. You'll likely find that it cracks more hashes than mask mode over the same amount of time. Mask mode is generally for cases when you know a part of a partially forgotten password or need to exclude certain patterns from further consideration. You can also use it along with other modes e.g. to split the workload by first letter - e.g., --incremental --mask='a?w' will prepend whatever incremental mode generates by letter "a" (hurting its efficiency somewhat, though, since the .chr files encode frequencies that assume character positions stay intact).

sectroyer commented 3 years ago

@sectroyer Now we're deep into topics that would be much better discussed on john-users than here.

Well for now I have to little data to really "discuss" it. I wanted to mention that just to explain my attitude towards this matter :)

If so, you could want to experiment with very different hash types, and see different strategies work best accordingly.

Yes I already noticed that and still playing around. For example on one node I have poor ("just for display") nvidia graphic card which is completely useless (using it for cracking reduces john speed more than I could gain) for des hashes however doubles nodes speed when used with md5. I just started with des as it was fastest to gather some data.

some nodes (with same number of cores) have avx, some avx2

For AVX vs. AVX2, you could double the number of forks on the AVX boxes (overbooking them, which would hurt performance really bad with OpenMP but is tolerable with fork). Or halve them on the AVX2 ones, considering those "cores" aren't real (you probably have 2 hardware threads per core). You can try it both ways.

Yeah will try different settings and report back once I get some more results :)

I am not a mathematician but such keyspace (at first glance) should be significantly smaller

See https://openwall.info/wiki/john/policy

Great info!! If you don't mind I would like to use as a reference if I come up with something worth publishing.

just to check if one char of 8 characters long password is an uppercase letter I have to run 8 separate sessions

You can use a revision of [List.External:Policy] instead, but only for much slower hashes or/and higher different salt count.

Sorry for dumb question but just to clarify I set up mask attack and john uses Policy to filter out only "matching pws"? Or is it something you put in john code and recompile?

sectroyer commented 3 years ago

AFAIK and if things hasn't changed, hashcat keyspace numbers are not often counted in "passwords" but in "blocks" of unknown and varying size. You can get a keyspace figure from hashcat of eg. 363,839 for an actual keyspace of 1,117,714,499 becuase the keyspace was counted in 3K blocks in that very occasion. Running the very same keyspace calculation on some other rig (eg. with other GPU's) could give you some other keyspace figure because the block size is larger or smaller (and I think could depend on autotune or other things). This all is quite natural and without it being this way you couldn't even handle really huge keyspaces but as a result, there's no way an external tool can spit out hashcat keyspace numbers. Or did I misunderstand what you asked for? Also, again please note that I may be wrong and/or things might have changed in hashcat.

I did check on two completely different systems (both hetergenous with two gpus) and the result of command was same:

./hashcat.bin -m 1500 -a3 -1 ?l?d -2 'abcdefghijklmn' '?u?1?1?1?1?1?1?u' --keyspace                                                
43670016

Just wanted to leave it if needed for reference or something.

magnumripper commented 3 years ago

That proves my point, since the actual keyspace you defined in that mask is 1,471,504,859,136. The figure you got as answer is - for whatever reason - divided by (or "in blocks of") 33,696. So where does that figure come from? I have no idea, it's not an even number even in binary terms.

magnumripper commented 3 years ago

OK, so 33,696 is 26x36x36. I bet with other masks you may get other block sizes.

sectroyer commented 3 years ago

Only thing I need is to have consistent keyspace size across single mask to split the task so that's doable now. However will keep your info in mind to not rely on it (I will try to compute it each run). Tough I was surprised that on two completely different systems this number was in fact consistent, they are even different brands so there has to be "some logic" behind it :) I wonder if this will keep consistent and I will keep on getting same values on different systems...

sectroyer commented 3 years ago

I noticed I get different keyspace size for different algorithm (same command form md5crypt gives 56596340736). However this isn't any longer main issue. Now I had to 🤦 after seeing this error message: Use of --skip/--limit is not supported with --increment or mask files.

After thinking about it for more than 5 seconds it's understandable but this doesn't change the fact that at this point hc lacks ANY (serious) distribute system support. If this --skip/--limit is really what hashtopolis uses internally to divide task nodes it's plain stupid. Computing this for every single password length and creating separate session... @magnumripper now I really want your "hashcat node patches" 😄 Will leave it for next day and try to figure out it there is "anything else" that I am missing but I think I am starting to understand why hashtopolis doesn't show ETA for nodes, doesn't support john etc. This is really big mess and probably tons of "hc specific hacks". There should be a way to simply express (like you do with --node) nodes power for single session. I understand I might have to set something up between different sessions however not such simple usecase... If I won't find anything I will definitely report an issue.

magnumripper commented 3 years ago

@magnumripper now I really want your "hashcat node patches" 😄

Unfortunately if I ever push such patches, they'll [likely] only add the alternative syntax of --node=m[-n]/total and would merely translate that into whatever --skip and --limit would be correct at the time. As long as there are restrictions against --increment and other things, it will still be a problem.

magnumripper commented 3 years ago

I noticed I get different keyspace size for different algorithm (same command form md5crypt gives 56596340736)

That makes sense as I believe what we're actually seeing here is sort-of their version of our "internal mask multiplier", which does vary a lot in JtR too, from one to tens of thousands varying with format and/or GPU power - BUT we never expose it in terms of node numbers or keyspace. I'm not saying they did anything wrong, just that our syntax is a whole lot easier to work with IMHO.

magnumripper commented 3 years ago

Also, in the hashcat world the --increment limitation is not by far as bad as it would be for JtR: Our incremental mode will, for example, try weaker candidates of length 9 earlier than less weak ones of length 4. Forcing the user to do one length at a time would be a seriously bad thing here.

sectroyer commented 3 years ago

Well I just noticed I made a "typo" in mask. I didn't put 2 (which I use to split between hc and john) anywhere in mask and keyspace size was same. It looks even when I do this: ./hashcat.bin -m 1500 -a3 -1 ?l?d -2 'abcdefghijklmn' '?u?2?2?1?1?1?1?u' --keyspace I also get 43670016. Not to mention that slower gpu node displays (with 10% already done) this eta: Time.Estimated...: Next Big Bang, (50198 years, 198 days) 🤦 🤣

Forcing the user to do one length at a time would be a seriously bad thing here. If I understand correctly (and am not missing anything) this is exactly what you have to do to split hc mask based tasks and I suspect this is what hashtopolis does... I made a quick fix that temporarily "resolved" this issue for me but ATM I can't think of any approach that would be usable in a long run.

Unfortunately if I ever push such patches, they'll [likely] only add the alternative syntax of --node=m[-n]/total and would merely translate that into whatever --skip and --limit would be correct at the time. As long as there are restrictions against --increment and other things, it will still be a problem.

If they would "translate" those numbers for each pw len to correct --skip/--limit values it would a hack but still something that can be used. We can argue that one approach is better than other and in such discussion I would be on johns side however even such hack is better than not having anything (serious) and this is how it looks for me right now...

magnumripper commented 3 years ago

Well I just noticed I made a "typo" in mask. I didn't put 2 (which I use to split between hc and john) anywhere in mask and keyspace size was same. It looks even when I do this: ./hashcat.bin -m 1500 -a3 -1 ?l?d -2 'abcdefghijklmn' '?u?2?2?1?1?1?1?u' --keyspace I also get 43670016.

This is exactly how JtR internal mask works and I expect hashcat to be similar: We decide on a "target gpu-side mask multiplier" and then try to get close to it using the factors at hand (in your case 26x for ?u, 36x for ?1 and 14x for ?2) but we may pick them anywhere in the mask if needed. Sometimes even that doesn't end up very well at least in JtR, but it's what we do.

Not to mention that slower gpu node displays (with 10% already done) this eta: Time.Estimated...: Next Big Bang, (50198 years, 198 days) 🤦 🤣

I predict next big bang some time later than that, but WW3 will predate it for sure and there might not be much of distinction between them for things like us 🙄

Forcing the user to do one length at a time would be a seriously bad thing here. If I understand correctly (and am not missing anything) this is exactly what you have to do to split hc mask based tasks and I suspect this is what hashtopolis does... I made a quick fix that temporarily "resolved" this issue for me but ATM I can't think of any approach that would be usable in a long run.

Unfortunately if I ever push such patches, they'll [likely] only add the alternative syntax of --node=m[-n]/total and would merely translate that into whatever --skip and --limit would be correct at the time. As long as there are restrictions against --increment and other things, it will still be a problem.

If they would "translate" those numbers for each pw len to correct --skip/--limit values it would a hack but still something that can be used. We can argue that one approach is better than other and in such discussion I would be on johns side however even such hack is better than not having anything (serious) and this is how it looks for me right now...

Perhaps I'll have a look at it again. After all, if we said --node=1-23/100 (for doing the first 23% of keyspace) maybe there's nothing really stopping us from re-doing the calculations for each length - but (IIRC) last time I checked it wasn't trivial to get a new keyspace figure at will. The thing that speaks against this happening any time soon is the fact I find hashcat superior very infrequently. Actually, nowadays I think it's solely the case of "single NT hash" and those are so very fast anyway it doesn't really matter what height within the exponential wall you're at - because it's, well, vertical. Believe it or not, there's ~not much~ virtually no difference between 100G p/s and 200G or even 800G p/s. Sure, that's 8x faster but you will NOT crack 8x more passwords. More like 0.8% more or so.

sectroyer commented 3 years ago

Perhaps I'll have a look at it again. After all, if we said --node=1-23/100 (for doing the first 23% of keyspace) maybe there's nothing really stopping us from re-doing the calculations for each length - but (IIRC) last time I checked it wasn't trivial to get a new keyspace figure at will.

I guess by "us" you mean hashcat? Since this is how I use it in john and looks to be working okay across different lengths

The thing that speaks against this happening any time soon is the fact I find hashcat superior very infrequently. Actually, nowadays I think it's solely the case of "single NT hash" and those are so very fast anyway it doesn't really matter what height within the exponential wall you're at - because it's, well, vertical. Believe it or not, there's ~not much~ virtually no difference between 100G p/s and 200G or even 800G p/s. Sure, that's 8x faster but you will NOT crack 8x more passwords. More like 0.8% more or so.

I get what you mean but for my small test cases It's matter of having session that will finish within an 1 hour or 8 hours 😄 That's why I switched to johncl for dictionary based attacks. For both descrypt and md5crypt (due to speed I mostly tested on those two types of hashes) johncl was significantly faster for dictionary based attacks.

magnumripper commented 3 years ago

maybe there's nothing really stopping us from re-doing the calculations for each length

I guess by "us" you mean hashcat?

Yes I meant I (or anyone) could add code for it.

sectroyer commented 3 years ago

Ok, back to this issue and I after some more digging I (partially) changed my mind about it. I still love --node option and I still believe hashcat SHOULD have something like that. Currently ONLY john has any concept of splitting a task between multiple nodes and there is no reason any other cracked wouldn't have something similar. If you want to simply split task between multiple nodes you have there should be a way to do it. HOWEVER recently I was playing around with hashtopolis and something clicked which made me realise why they do what they do. Firstly I think it's obvious to anybody reading this but just to explain I am a "beginner" at best in terms of hashing, cracking passwords, keyspaces etc. For me it's just the beginning of this research and there is still a lot to learn. Nevertheless I have "few years" of experience with distributed heterogenous systems. About a month ago I started researching different ways of distributing cracking task. Especially in systems consisting of pure cpu, multi-gpu, gpu+cpu nodes etc. I have written a "small tool" which allows me to distribute such tasks in a test environment and here is where I started using --node for splitting keyspace between john nodes and it worked like a charm. Recently I started to look for a way to add this support to hashtopolis. I started implementing some stuff etc. and as I said "it clicked". Basically let's say we can split distributed systems into two groups. In first are those who behave similarly to mine (mpi I believe does the same?) where we take single task build "node list" split it between nodes and wait for task to finish. Nevertheless there is second "kind" where nothing like this is performed. Well at first it looks "similar" but in principle it's something completely different. There probably tons of systems that use this "second approach" but I first stumbled on it while playing around with(basically now dead for years) Xgrid system. The idea is that we split the "job" into "tasks" or simply each task is split into batches of data. Each batch/task is independent from each other and even can be joined/split multiple times. Basically whole computation is "stateless" in a sense that each batch is independent from each other. Nodes (often called agents) can go and new can be added while "job"/task is computed. And that's why john should have options like --skip and --limit. I am still not sure if we should implement same "algorithm" (for example to stay consistent between tools) like hashcat does but definitely such options should exist. Without them it's possible to distribute john mask tasks but it's not possible to use john in any "agent based" distributed system. Hope I was able to explain this two concepts :)

solardiz commented 3 years ago

@sectroyer This is more of a john-users topic. Anyway:

What you describe as the "second approach" is something I've been successfully using --node for, with very high virtual node counts. Like a million. In 2012-2014 for use in password cracking contests, I had a set of Perl scripts communicating over SSH forced commands, where the agents would request next batches of work (john command lines including the next virtual --node to pretend to be). I didn't even use --fork with that - every agent process was treated separately. The assigned jobs were kept track of in files on the controller, and if a job wasn't completed for a long time (such as because a machine failed), it would be automatically handed out to another agent. The same scripts also collected the agents' pot files on the controller, and they would automatically transfer to the agents any changed input files from the controller.

No need to have --skip and --limit to implement a setup like that. Rather, we need to improve some specific modes to be friendlier to very high virtual node counts. My setup above worked great with some modes (mostly wordlist with rules was used), where I could allocate tiny jobs - lasting a few minutes each, so with a few hundred agents the controller was assigning a new job to some agent once per second or so. With incremental mode, I had to reduce the virtual node count from millions to hundreds or a thousand, and accept that the portions were taking longer (in fact, progressively even longer after days of running - but that was irrelevant in contests, which lasted ~2 days only anyway). There's room for improvement in that mode's implementation of --node.

There was also a third component - a proxy - to keep behind-NAT agents' connections active across controller restarts.

sectroyer commented 3 years ago

@solardiz I've decided to put it here because this issue was about adding such options to john and I wanted to add something to this discussion. First of all I dig around little more and I do NOT believe --skip/--limit were implemented in hashcat to be used in distributed systems. If I understand correctly they exists because hashcat stores full command line it was started with in restore file and reads skip/limit values and passes them to itself internally on resume. I wouldn't be surprised if they even were "exposed unintentionally". But getting back to approach you described. Yes you can use huge node numbers and achieve something that looks similar but it's completely different. I still believe --node option is best when you want to split a task between specific set of machines. Also you don't need any fancy system for distributing them. As you explained sending commands via ssh is more than enough (I use similar approach). However in "second approach" I was describing distributed system itself takes care of data. The higher the node value you specify the higher the "flexibility" but switching between them can become dominant task. Also you are taking away from distributed system ability to freely handle data. Once you split task using --node it's done, nothing can be change, nothing altered. When distributed system manages data it can do it dynamically. At any point new nodes can be added removed and system decides how much data is signed to different machines. Also scalability of --node ends with single node value, distributed system cannot divide it anymore. Just to clarify it's not that --node can't be used for many scenarios we are discussing but it will always have edge cases where it is inferior. Also at the same time it's taking away big part of distributed system task scheduler by deciding upfront how data can be divided and therefore limiting it's options. At the same time I think internally --node does set some "start/stop" values for mask attacks? I don't think it would matter if you would add an two options that could used in similar manner. I believe I can select any part of a task by specifying "correct" values for --node=x-y/z just at the moment I don't know how to calculate them 😄

magnumripper commented 3 years ago

Once you split task using --node it's done, nothing can be change, nothing altered. When distributed system manages data it can do it dynamically. At any point new nodes can be added removed and system decides how much data is signed to different machines. Also scalability of --node ends with single node value, distributed system cannot divide it anymore.

The thing is most of our modes will try to get the best candidates early (as opposed to dumb-force). So once you split it in say 100,000 pieces, you'd always hand out the lowest node (piece) number(s) not yet completed. If some node stopped reporting results, you can re-issue that work piece just like hashtopus would.

At the same time I think internally --node does set some "start/stop" values for mask attacks?

No! And that's why it's not that simple. If your keyspace is a quadrillion and you said --node=3/3, most or all of our modes would NOT start node 3 at two thirds of a quadrillion into that keyspace - because that would only make sense for dumb-force. What they do is hand out candidate 3, 6, 9 and so on to node 3. Sometimes with less granularity than that (for performance reasons), but you get the idea.

magnumripper commented 3 years ago

So once you split it in say 100,000 pieces, you'd always hand out the lowest node (piece) number(s) not yet completed

No, wait. The second paragraph of what I just wrote is true. And this makes the first paragraph not make any sense.

magnumripper commented 3 years ago

Somewhere in my own confusion above, I can see the point of what you are asking for. Since we hopefully serve the very best candidates early, splitting it in 100,000 pieces by round-robin would be a bad thing in case it takes ages before the 90,000th piece gets served to any node. So for applications like hashtopus it might be a good thing to mimic the skip/limit options.

However, several of our cracking modes are not quite aware of the point in keyspace so easily. If I wanted to skip 123,456,789 candidates into some particular mask keyspace, it's probably trivial. But doing the same with Incremental mode is not easy. It has been discussed (in slightly different contexts I think) long ago - we could implement it but the code isn't currently there.

magnumripper commented 3 years ago

I think this is interesting enough to re-open, at least while assessing the problem and possible solutions.

solardiz commented 3 years ago

@magnumripper I think @sectroyer wasn't even talking about testing the candidate passwords in an optimal order yet.

@sectroyer Your latest comment tells me that you misunderstood my previous one. Let me illustrate what I had meant. Suppose we split the task into 100000 virtual node numbers. We can then hand out jobs like --node=1/100000, then --node=2/100000 and so on. If we only have, say, 100 agents, we'd initially only issue jobs up to --node=100/100000. Once the first of these completes, we'd replace it with --node=101/100000, and so on. Under this model, you would not have agents cover the whole range of virtual node numbers at once - but rather they would do that (mostly) sequentially (with only as much parallelism as there is in the number of agents, and with some dynamic re-allocations if machines fail, etc.)

Once you split task using --node it's done, nothing can be change, nothing altered. When distributed system manages data it can do it dynamically.

There's actually no contradiction here. Just use --node to split the task into small enough pieces that you can manage them dynamically. My scripts did just that.

solardiz commented 3 years ago

Oh, by the way, my scripts also allowed for changing the number of agents on the fly - new ones connect, some die off, etc. It's this flexible.

magnumripper commented 3 years ago

@magnumripper I think @sectroyer wasn't even talking about testing the candidate passwords in an optimal order yet.

Maybe not, but I didn't quite recognize this problem until now. I think we might actually want to add skip/limit functionality for that very reason - unless the implementation of it is so "hard" we decide it's not worth it.

solardiz commented 3 years ago

I think we might actually want to add skip/limit functionality for that very reason

I'm not seeing whatever reason you're seeing.

Let's first see what we have with --node now. Like I mention, it worked quite well for me as a primitive for building dynamic distributed processing upon. As I recall, things were also reasonable in terms of candidate ordering.

magnumripper commented 3 years ago

I think we might actually want to add skip/limit functionality for that very reason

I'm not seeing whatever reason you're seeing.

We (at least I) have been saying for a week or two that you can just split the job into a very big number, like 100,000 nodes, and then use them as "work items" as opposed to actual nodes. This is highly suboptimal considering we try to split by round robin, is it not?

solardiz commented 3 years ago

This is highly suboptimal considering we try to split by round robin, is it not?

Oh, right. Now I think only cared about that during those contests for incremental mode, running it as a never-ending mode, and I worked that around somehow (actually set the total node count to beyond the number of "charset order" entries, letting the node numbers simply become the entry numbers?)

sectroyer commented 3 years ago

@sectroyer Your latest comment tells me that you misunderstood my previous one. @solardiz I understood what you mean but probably I didn't explain clearly what I meant. For your example 100000 is the limit you can't handle 10x nodes or anything like that. But as I explained you CAN do a lot with --node as it's pretty powerful but still a lot of this should be done by distributed systems. Also I didn't mean to say that --skip/--limit should replace --node. I think of those two "approaches" as completely different use cases. Johns --node can do a lot that --skip/--limit can do but it's not the same and the flexibility is lower. It also works in the other way around as I am using --skip/--limit options for hashcat nodes and it "kinda works" but it's a hacky solution and I would prefer for hashcat to simply have --node equivalent.

Anyway @solardiz @magnumripper thanks for patience to understand my convoluted explanation 👍

At the beginning I was thinking about combining hashcat and john in a single mask task but it can't be done like that. As @magnumripper pointed out hashcat keyspace calculation is pretty messy. It looks to be consistent for gpus but for example for CPU nodes I think it differs. I have added a cpu only agent to hashtopolis and it's not used at all 😄 I think that's why they have a concept of "cpu only tasks". I think using similar approach ("john only tasks") it shouldn't be much work to add support for john based agent to hashtopolis. Of course there is a need for special john agent but it's definitely doable once you add --skip/--limit 👍

solardiz commented 3 years ago

@sectroyer I don't understand what you mean by this:

For your example 100000 is the limit you can't handle 10x nodes or anything like that.

Please provide an example of what you'd like to do but you think you cannot. I continue to think you misunderstood me.

magnumripper commented 3 years ago

At the beginning I was thinking about combining hashcat and john in a single mask task but it can't be done like that.

No, regardless what we implement you will never be able to define a work piece such that it will be the actual same bunch of candidates between john and hashcat, that's for sure. We do things differently on every level. So hashtopus would definitely need to define a task either john-only or hashcat-only. Now once that fact is established, perhaps your "need" for skip/limit fades significantly? They'll be john-specific tasks anyway, so could simply use our existing node syntax for now and be done with it.

However, some time in the future (definitely post next release) I think I'll experiment with implementing keyspace/skip/limit options just for the sake of not doing any round robin.

solardiz commented 3 years ago

At the beginning I was thinking about combining hashcat and john in a single mask task

You sort of can do that by splitting the mask by a character, e.g. have hashcat test passwords starting with [a-i] and john [j-z]. Then it doesn't matter that the tools test the rest of the mask in different order and different blocks.

I think I'll experiment with implementing keyspace/skip/limit options just for the sake of not doing any round robin.

Maybe it should be a modifier option changing the behavior of --node instead.

In a sense, this is analogous to "words first" vs. "rules first", which is also something we could have a modifier option for (do we already have that functionality through abuse of --rules-stack, though?)

magnumripper commented 3 years ago

At the beginning I was thinking about combining hashcat and john in a single mask task

You sort of can do that by splitting the mask by a character, e.g. have hashcat test passwords starting with [a-i] and john [j-z]. Then it doesn't matter that the tools test the rest of the mask in different order and different blocks.

Good point.

I think I'll experiment with implementing keyspace/skip/limit options just for the sake of not doing any round robin.

Maybe it should be a modifier option changing the behavior of --node instead.

In a sense, this is analogous to "words first" vs. "rules first", which is also something we could have a modifier option for (do we already have that functionality through abuse of --rules-stack, though?)

True. While the example is only relevant for the wordlist+rules case, yes, I too had the idea this might be an option at some other level.