Closed Pilly170 closed 9 months ago
I think implementing an automatic splitting is not relevant, as this is already easy to do with basic command-line (sed/awk/bash).
The interest of duplicut is that it resolves a problem that is not feasible with common tools: Removing duplicates from a big wordlist, without loosing the original order, quickly, and without being out of memory.
Implementing a split is very specific use case, and does not depend on duplicut's optimisations.
Therefore, i agree with you about implementing a --line-min-size
option, as this additional check will have a very low impact on get_next_line()
performances.
(yeah, get_next_line()
is the most important function of duplicut, and every CPU cycle is important because it is called twice per line)
Also, feel free to take a look at some scripts i personnally use to optimize and build my wordlists, they my be useful for you: https://github.com/nil0x42/cracking-utils
Thank you for the feature request :)
Hi Using the same technique as for duplicates, is it possible to remove words that are <> a certain word length. (min/max word length) Also if you could use the same method for splitting a wordlist into separate wordlists based on word length? so it can split one large wordlist into separate lists such as 8chr words,9chr,10chr etc
Just a thought as you've nailed the duplicates with this technique