Open Zeokat opened 7 years ago
This new enhancement request is just fine. How about you also post to john-users (linking to this page), so that others can see what is requested.
NOTE, the way jtr loads files, it will almost certainly be best to not have too many of these $HEX[xxxx] words. When they are needed, great, but taking a wordlist like rock-u (or worse yet, something 10's of times bigger), and converting every line into a $HEX[] string would very likely NOT be optimal for memory usage. Things could be made to work, but at the expense (likely) of memory. However, take for instance rock-u. I think there are some binary items in there. To make sure they are correctly handled, they could be properly converted into $HEX[] items, while leaving all of the 7 bit ascii data intact.
But I do like this idea. It might be easy, but it also might not be 'super' easy.
I also think we should look at something like this for .pot storage.
Also, a simple tool to convert into and out of $HEX[] format (enhancement to th base64_convert program should do the trick)
@magnumripper can you think of other areas where $HEX[] should be dealt with if/when we add this option? If so, then edit the top post, adding a new task.
I didn't know HC supported this format for wordlists (I knew about the pot files). What if a user sets his password to (verbatim) $HEX[deadbabe]? Will that make it impossible to crack by Hashcat? I recently had a patch accepted for HC that stops outputting silly hex for perfectly fine UTF-8 strings - and nowadays that about gets rid of ALL the hex. Virtually everything uses Unicode now.
@Zeokat please note that while HC can (I think) fully use input with CR, LF and even NULL characters, JtR can not - and this HEX format can't change that fact. We'd need to change every little part of JtR core for that to happen (using pascal strings starting with a length byte). It would probably be faster though, it's a pity Solar didn't make it that way.
Because of all the above and other things I see no need for this, but feel free to implement it.
feel free to implement it.
Needless to say, this must be handled in cold code though. I wouldn't accept this functionality causing a performance regression.
Which systems accept arbitrary hex values as passwords, so that a password cannot be cracked without supporting $HEX[...]
?
I wouldn't want that additional complexity just to be able to "crack" md5(md5($p)), i.e. find the md5 hash that is the result of md5($p).
Once you add that functionality, you'll need to support it in future. So there must be a convincing reason to add it.
raw($HEX[deadbabe]) == $HEX[244845585b64656164626162655d]
Ah yes of course. Still. I can't really see the point of using HEX as long as we don't support LF and NULL in passwords anyway.
I have seen cleartexts with backspace characters and other weird stuff but I'm pretty sure it was always caused by cracking eg. salted MD5 using raw MD5. So the weird characters was actually a salt. And btw we do support anything including NULL in a salt (if we do it properly and use the salted format).
raw($HEX[deadbabe]) == $HEX[244845585b64656164626162655d]
But when you crack it, since this is plain ASCII HC will write it to pot file as
4835410afad7b5730246ef827dd7a211:$HEX[deadbabe]
You won't know what actually happened. You will think the password is the four bytes of DE AD BA BE.
Another thought: If we implement this for pot files, we have to implement it for wordlists at the same time. Otherwise we'll break loopback mode.
Ah yes of course. Still. I can't really see the point of using HEX as long as we don't support LF and NULL in passwords anyway.
We absolutely support LF and NL chars. BUT wordlist will strip them out. Thus, the only way to have them in a wordlist, is to have some form of escapement, OR use something like external which slows down runtime a huge amount.
I know you are not seeing a need for $HEX[xxx]. I certainly do. No, we do not get a NULL byte capable in any way. That is because we use C style strings everywhere (wordlist, external, all formats, cracker, etc). There is no way around that other than porting EVERYTHING to use pascal style strings (HUGE rewrite of john, and the ROI is not even close to it). But $HEX[] opens up ability to have any string (other than embedded nulls) work properly. Yes, there also ARE systems which can use any arbitrary byte (not just valid UTF-X characters). This was even one of the 'techniques' taught to create hard to break passwords. Using the ALT key down and the keypad allows putting any byte out (on a Windows/dos box). That mentality WAS used by some savvy system admins, and being able to specify EXACT byte sequences with no encode mangling is of benefit.
I see your point about the raw($HEX[deadbabe]) == $HEX[244845585b64656164626162655d] .pot file catch 22. BUT the password is there. This would show up as a .pot file re-run failure in the test suite (if we added one of these). That would just be a limitation. I am pretty sure the same 'flaw' is in HC. The item will crack, but will be somewhat hidden by the .pot file processing.
NOTE, we could handle this by putting $HEX[244845585b64656164626162655d] out into the .pot file. Since it is a crack, it can be viewed as an exceptional event, we can check for the password being of a $HEX[] type, and then hexify that one. We would also ALWAYS have to check, looking for \n or \r. We could add options to john.conf to also encode any non-valid encoding strings, or any chars < 0x20, or even ANYTHING < 0x20 or > 0x7f (at user option). Note, we might also add a mode to encode anything < 0x20 or > 0x7f that is not valid utf8. This would make code page cracks work (but I think we convert to utf8 in that case, correct?)
There is a lot of nuanced discussion and planning that would be needed for this. It should not be hard to do, BUT to do it properly, it should not just be 'coded until it works'. It should be thought through fully.
Right now, we 'can' crack passwords with \r and \n in them (using externals, masks, etc), BUT the .pot file is broken. This $HEX[] would correct that problem, AND also allow the input file words to natively handle the \r and \n without using extern/mask to do it. NOTE, that is a pretty big ROI, since there are passwords ITW (actually more than just a few), which have \n in them.
I can see a point in having $HEX[cafe0809ca0d0afe] pot file entries. And whether I like it or not, that means wordlist has to support that format as input too. So we don't really have to discuss pros and cons more.
I think we should have a conf entry (only) for controlling the pot file behavior. It could be something like StoreHexNotation = WHEN
. It's not just true or false.
I think it could be:
For reference, we have the following related conf variables already:
# Always store Unicode (UTF-16) passwords as UTF-8 in john.pot, regardless
# of input encoding. This prevents john.pot from being filled with mixed
# and eventually unknown encodings. This is recommended if you have your
# terminal set for UTF-8 and/or you want to run --loopback for LM->NT
# including non-ASCII.
UnicodeStoreUTF8 = Y
# Always report/store non-Unicode formats as UTF-8, regardless of input
# encoding. Note: The actual codepage that was used is not stored anywhere
# except in the log file.
# This is needed eg. for --loopback to crack LM->NT including non-ASCII.
CPstoreUTF8 = Y
This means some combinations would be ambiguous:
UnicodeStoreUTF8 = Y
CPstoreUTF8 = Y
StoreHexNotation = Always
To simplify things I think StoreHexNotation should simply (and silently) override the others (and this should be documented in the conf comment).
For screen output of cracks, we currently have this:
# Always report (to screen and log) cracked passwords as UTF-8, regardless of
# input encoding. This is recommended if you have your terminal set for UTF-8.
AlwaysReportUTF8 = Y
If we want to get screen output hexified too (like Hashcat) we could add ReportHexNotation = WHEN
with same Always/Never/Invalid values and similar silent override of the older variable.
Oh and I think both should default to "Invalid". Or perhaps "Never"... the latter would be least change from current behvior.
So here's a catch. Let's say you have
StoreHexNotation = Always
ReportHexNotation = Invalid
The correct behavior should be this:
--show
on that file (still same conf settings), it should parse the hex and output it as "Düsseldorf" because of our settings. So not only wordlist.c needs to understand $HEX[...] input, but also loader.c.We should make default 'Never', and then there is zero change in john's behavior, other than any \n getting hex notation.
I also think that 'Never' should handle the recursive $HEX[] listed above. Yes, that is an obscure issue, BUT there are often 'password policies' written specifically to pervert password crackers (such as using \n, or obscure byte sequences, or a specific attack against john, of using the ':' character. Also, I think that in 'Never' mode we should use hex for these cases:
anything containing \n anything containing : any $HEX[xxx] input (full word only).
I think that is all cases.
Btw, I am not overly thrilled with the 'Always', but I could see it for some users, since it allows a single field type if they are using other tools to parse the output or .pot files.
Agreed except we should not hex anything merely containing :
(and definitely not at "Never"). There's just no need to do so. In the pot file, the first colon is a separator, any other are part of plaintext. So there is no problem to fix.
Hashcat never use HEX just because of a :
even though HC has a tricker situation with :
also being salt separator and so on.
BTW I also don't think we should hex a $HEX[...] input output in "Never" mode, only in the others. I really think "Never" should only ever hex linefeed characters.
Ok, I can certainly live with that logic. It does work around the problems with the LF in the .pot files.
What about in the input files? Should $HEX[] simply be auto handled always? I would think so.
If so (always on input), then it is .pot and screen (crack found and -show) output which we allow configurable control over.
(I re-read your prior post (2 above this one)
Seriously? I would think that input (wordlists) simply always worked. That was what the original post was about. That we allow the end user to adjust how the output is, I think is a good (but a bit complex) thing. But not supporting this on input wordlist stuff, just does not make sense to me.
No, I think I agree we should always handle hex input. I meant the "Never" modes should not output/store "hex for hex".
Good. I initially read your post the way it is now, and then when it arrived in email, the word 'input' jumped out at me. Just needed clarification.
On a side note, Atom added hexifying :
within pot plaintexts in Hashcat today.
Related, but different: #2437.
I think our highest priority and most obviously desirable should be "add $HEX[] storing/handling to .pot files and screen output (logger.c, loader.c)". I agree it's a change in the logger. It's probably also a change in the loader just to avoid code duplication for --loopback
and --make-charset
, although maybe the former should be taken care of as part of addition of hex support to wordlist handling in general.
On a side note, Atom added hexifying
:
within pot plaintexts in Hashcat today.
I think this is unnecessary (with just two fields, anything after the first colon is just the second field), but it reminds me that if we don't hexify :
in pot, then we need to hexify it on --show
.
Some relevant discussion in https://github.com/openwall/john/pull/4947#issuecomment-997022318
Hashcat support $HEX[] wordlists to encode control chars. Will be nice have this option available into JTR, in that way we can use hashcat wordlists into JTR.
So, a password of "foobar1[CRLF]" becomes: $HEX[666f6f626172310d0a]
Is this format already supported in JTR? Do you plan to add it? Exist any way to handle control chars in wordlists?
Not sure if i should post this here or at john-users list, since it contains ideas and also questions 😕 .
Thanks in advance for your help.