hashcat / hashcat

World's fastest and most advanced password recovery utility
https://hashcat.net/hashcat/
21.32k stars 2.91k forks source link

FR: Add a way to flush partial dictionary buffer in 'stdin' mode #2801

Open willcrozi opened 3 years ago

willcrozi commented 3 years ago

Proposal is for a new command line option, --stdin-timeout-flush=x that when set to a non-zero value will cause partial input buffers to be submitted for cracking when no new input has been seen for x seconds. The default value of 0 would have the normal previous behaviour of only submitting partial buffer contents when stdin handle is closed causing an EOF.

Normally, in stdin mode, hashcat will wait until it has enough dictionary data to send a batch of words to be cracked. Currently it seems the only way for a partial buffer to be submitted for cracking is if all handles to hashcat's stdin are dropped (i.e. pipes to hashcat are closed), causing hashcat to receive EOF, finalise then exit.

This feature request is for a way to avoid having to close the handle to hashcat's stdin in order to force further password cracking even though the input dictionary buffer is only partially full. It would be useful to be able to keep a hashcat instance ready to receive more input but also having attempted to crack every word it has currently been sent.

Using my own use case as an example, I have a custom hashcat 'cluster' solution where a 'boss' process/machine generates custom dictionaries for cracking and 'worker' nodes running hashcat connect to and receive 'chunks' of work form the boss process. Chunks that are sent to workers may need to be retried on another node in case of disconnections or timeouts. In some circumstances (e.g. the tail end of a 'job') such a setup could require hashcat worker instances to sit with only partially full buffers needing to report back all progress whilst still being ready to receive more work in case dictionary chunks need to be retried on another worker node.

I've made an attempt to implement this here.

It seems to work well and the changes seem pretty straightforward. If there's enough interest I can submit a pull request.

jsteube commented 3 years ago

Looks good, we can merge this if you send a PR. There's two issues that should be fixed before.


 event_log_advice (hashcat_ctx, "Flushing partial batch of password candidates (count: %lu)...", *(u64 *) buf);

This will fail on windows. Please replace with PRIu64 macro.


There's a logic error with abort trigger. After flush occurs the abort trigger timeout will never trigger anymore. I can foresee this is kind of the logic you want to have in your environment with the boss/worker structure, but it's breaking the feature. Instead it should be fixed and you should set the abort timer to 0 (and allow this case in parameter check) in your environment to disable it.

Here's how to reproduce:

$ ./hashcat example400.hash -m 400 --stdin-timeout-flush 1 --stdin-timeout-abort 2 
Starting attack in stdin mode...

Then user input:

a
Flushing partial batch of password candidates (count: 1)...

After a while:

Time.Started.....: Sat May 29 11:33:21 2021 (10 secs)

The 10 second should be enough to trigger abort timer which was set to 2 from command line, but it didn't.

willcrozi commented 3 years ago

Thanks for the feedback!

I have the fixes for the formatting macro and parameter check ready for a pull request.


There's a logic error with abort trigger. After flush occurs the abort trigger timeout will never trigger anymore. I can foresee this is kind of the logic you want to have in your environment with the boss/worker structure, but it's breaking the feature.

I'm struggling to determine the correct way to handle the interaction with --stdin-timeout-abort here.

Specifically, in monitor.c we have a check on the progress that cancels the abort if progress is greater than zero. This was introduced in d412333. Wouldn't we need to revert the progress check in this commit to achieve the desired behaviour above?

jsteube commented 3 years ago

I think we can leave it as it is and update the --help message description. Feel free to send in the PR.

willcrozi commented 3 years ago

I've run into a problem with this feature, at least on my Linux system (Ubuntu 20.04 x86_64).

It seems that libc's buffering policy for pipes that are not connected to a TTY causes some passwords to sit in stdin with select() only returning non-zero when more input is received or the pipe is closed. Here's a decent writeup of the issue.

So far the only way to prevent this seems to be to disable buffering with setvbuf(stdin, NULL, _IONBF, 0). How feasible would it be to disable buffering for stdin input in hashcat?

Currently I'm investigating how this might affect performance, and so far I can't see any difference when running:

time cat example.dict |./hashcat -a 0 example.dict --stdout > /dev/null

I get the impression that this doesn't particularly test the raw IO performance as I think the candidates are sent to the GPU. Is there any other test where I could get a better picture of the overhead?

jsteube commented 3 years ago

So far the only way to prevent this seems to be to disable buffering with setvbuf(stdin, NULL, _IONBF, 0). How feasible would it be to disable buffering for stdin input in hashcat?

That seems doable for Linux, but the userbase actually is like 1/3 Linux, 1/3 Windows and 1/3 MacOS. If you want to do this change you need to find a solution for the other OS, too. Alternatively disable this feature for other OS, but that's something like the very last thing to consider and we should avoid.

Currently I'm investigating how this might affect performance, and so far I can't see any difference when running:

I think performance is not really the problem here, because from how I understand from your use-case you are not providing a lot of password candidates anyway.

I get the impression that this doesn't particularly test the raw IO performance as I think the candidates are sent to the GPU. Is there any other test where I could get a better picture of the overhead?

Correct. Depending on your target hash-mode your enemy is the PCI-e bottleneck. A better way to test stdin performance is by using maskprocessor with a large mask like ?a?a?a?a?a?a?a

willcrozi commented 3 years ago

That seems doable for Linux, but the userbase actually is like 1/3 Linux, 1/3 Windows and 1/3 MacOS. If you want to do this change you need to find a solution for the other OS, too. Alternatively disable this feature for other OS, but that's something like the very last thing to consider and we should avoid.

setvbuf is available on all these platforms via stdio.h so I'll some testing to determine if Windows and MacOS require the same treatment. It may take a while to get these done.

Correct. Depending on your target hash-mode your enemy is the PCI-e bottleneck. A better way to test stdin performance is by using maskprocessor with a large mask like ?a?a?a?a?a?a?a

For now I'm using a quick and dirty test consisting of a modified version of hashcat with the call to add_pw() disabled. My first impression is that that impact of just naively adding setvbuf(stdin, NULL, _IONBF, 0) at least doubles the time spent reading from stdin.

I have played around with replacing fgets() with fread() and performing the buffering manually and the results are promising. It appears this approach improves stdin performance significantly over the current buffered/fgets() based approach that is in master. It's starting to look like a win-win scenario is achievable here!

I'll work on this further this week (when I can) and check the results on all platforms. These IO changes might be best as a separate PR as they seem a useful change in their own right (improving efficiency as well as 'fixing' the buffering behaviour). I can then base the --stdin-timeout-flush feature on top with a separate PR.

jsteube commented 3 years ago

I have played around with replacing fgets() with fread() and performing the buffering manually and the results are promising. It appears this approach improves stdin performance significantly over the current buffered/fgets() based approach that is in master. It's starting to look like a win-win scenario is achievable here!

Any performance improvements are very welcome

willcrozi commented 3 years ago

Looking into this further, I think I'll need to perform non-blocking IO on stdin after all.

This can be achieved with a call to fnctl()(POSIX API), which is available on Linux and MacOS. I'm also thinking it should work fine on all the Windows targets listed in src/Makefile (i.e. WSL, MSYS2, and Cygwin).

The doubt I have is that there's some _WIN defines dotted around in some files that seem to use Win32 API calls but I can't see how/where they would be enabled (my C programming experience doesn't include much outside of Linux/POSIX environments I'm afraid).

Are the above Windows targets all those that need to be supported?

jsteube commented 3 years ago

I have the same problem and yeah, WSL, MSYS and Cygwin should be fine. Well and of course the native windows target.

willcrozi commented 3 years ago

Well and of course the native windows target.

This is what I was getting at. I can't find any references to creating a native Windows build, only the BUILD.md files for WSL, MSYS, and Cygwin. Are you able to shed any light on how this would be done? I would've thought a .vsproj file or similar would be required.

jsteube commented 3 years ago

I see. There's no need for VS. We use mingw to cross compile the native windows binaries. If you follow the BUILD_WSL.md what you're actually building is a native windows binary that you can start either from cmd.exe or from inside WSL.