Add option to suppress double word output?

magnumripper commented 9 years ago

How about an option that suppresses output words where the same element is used twice in a row, eg. "correctcorrect" or "correcthorsebatterybatterystaple"?

Not sure how to implement it but I reckon it can be useful (at times) for limiting the keyspace.

jsteube commented 9 years ago

Yeah, I think it's a good idea as long as we make it so that the user has to turn that option on by himself.

Just as a reminder, prince is also about brute-force. The sorting of the keyspace creates the 'smooth transition' that leads into a pure brute-force attack in case the user uses one-letter words in the input wordlist. In brute-force we actually want same element used twice.

About the implementation. I think we should do that straight forward. What I mean is we check on chain level, against the *buf array. Use the current element position with -1 and +1 and check for the same element number. @Sc00bz any ideas here?

Sc00bz commented 9 years ago

When I heard of PP I looked into no duplicate words because I was thinking of the "pick 4+ things near you right now" type of passwords, but even that isn't that much of a difference: (100^4 - 100 * 99 * 98 * 97) / 100^4 = 5.8906% (no duplicate words) (100^4 - 100 * 99 * 99 * 99) / 100^4 = 2.9701% (no double words)

The key space is only 3% smaller with a small number of words, 100, and four words of the same size next to each other. 6% smaller if you did no duplicate words. Yes if you look at really small words you might have less but I don't think the complexity of this and the new skip is worth it.

The only real problem is that the first 1% of the key space will all be duplicate/double words. So really if we wanted to change something it would be the order passwords are outputted. The easiest is just start with offsets of 3,2,1,0 (maybe there's a better method like 0/4_N,1/4_N,2/4_N,3/4_N) then just "%N" and "overflow" at positions 3,2,1,0. That makes skip super simple and shouldn't slow this down much. With offsets 3,2,1,0, the second to last 1% of the key space are all duplicate/double words. So the first 98% of the key space has 2% of the key space's double words and the last 2% of the key space has 1% of the key space's.

Huh after writing that I think we should do that by default and have no other option.

Also when I say key space I'm talking about a chain's key space and not the whole key space.

jsteube commented 9 years ago

I'm confused. With princeprocessor, there should be no such case. Maybe I've understood it somehow wrong. OK, we expect the input wordlist to be of unique words only. But if that's the case, then there is no duplicate word.

root@et:~/princeprocessor/src# cat words 1 2 3 4 root@et:~/princeprocessor/src# ./pp64.bin --elem-cnt-min 4 --elem-cnt-max 4 < words | wc -l 256 root@et:~/princeprocessor/src# ./pp64.bin --elem-cnt-min 4 --elem-cnt-max 4 < words | sort -u | wc -l 256

If this is not what you meant, can you please make an practical example that explains what you mean?

Sc00bz commented 9 years ago

Repeated words in a single password:

$ ./pp64-o.bin --elem-cnt-min 4 --elem-cnt-max 4 --limit 10 < words
1111
2111
3111
4111
1211
2211
3211
4211
1311
2311

Besides all of these with multiple 1's there's "2211" which has a double 2 and a double 1. @magnumripper is just talking about when they are next to each other. So "2121" would be fine. When I was saying no duplicates "2121" would not be fine because there are multiple 1's and 2's.

magnumripper commented 9 years ago

My initial idea was only for two or more consecutive elements so "2121" would be just fine (but "2112" would not) from the elements "1" and "2". I'm thinking short wordlists, producing sentences like "IloveSarah" and in that case there should almost never be two (or more) same words (elements) in a row. In particular, we don't want even worse candidates like "III" or "IIIIIIII" within in a huge number of output words just because that's a short element.

Anyway, this is only worthwhile if it can be done outside of the password generation loop or otherwise with no performance impact.

jsteube commented 9 years ago

So if it's really like that, than I understood it correctly from the beginning. In that case the best way to detect such a case would be to check current element position with -1 and +1 and check for the same element number. At least I think so. But it will cost a bit of performance, hard to say how much.

magnumripper commented 9 years ago

I'll do some experiments when I get the time.

magnumripper commented 8 years ago

Re-found this issue after thinking about https://hashcat.net/forum/thread-5074.html. In that case, non-consecutive dupes would better be rejected too.

Perhaps both options (ie. consecutive or not) are useful, for different use cases. But after re-reading https://github.com/hashcat/princeprocessor/issues/28#issuecomment-70854516 above I think that's a better idea - we should try to get candidates consisting of dupe elements produced later instead of rejecting them.

hashcat / princeprocessor

Add option to suppress double word output? #28