nil0x42 / duplicut

Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
GNU General Public License v3.0
870 stars 90 forks source link

Feature request: Word length #8

Closed Pilly170 closed 9 months ago

Pilly170 commented 5 years ago

Hi Using the same technique as for duplicates, is it possible to remove words that are <> a certain word length. (min/max word length) Also if you could use the same method for splitting a wordlist into separate wordlists based on word length? so it can split one large wordlist into separate lists such as 8chr words,9chr,10chr etc

Just a thought as you've nailed the duplicates with this technique

nil0x42 commented 5 years ago

I think implementing an automatic splitting is not relevant, as this is already easy to do with basic command-line (sed/awk/bash).

The interest of duplicut is that it resolves a problem that is not feasible with common tools: Removing duplicates from a big wordlist, without loosing the original order, quickly, and without being out of memory.

Implementing a split is very specific use case, and does not depend on duplicut's optimisations.


Therefore, i agree with you about implementing a --line-min-size option, as this additional check will have a very low impact on get_next_line() performances. (yeah, get_next_line() is the most important function of duplicut, and every CPU cycle is important because it is called twice per line)


Also, feel free to take a look at some scripts i personnally use to optimize and build my wordlists, they my be useful for you: https://github.com/nil0x42/cracking-utils

Thank you for the feature request :)