Duplicate filter feature ideas (-fd, -filter-duplicates)

Hello there,

I was surprised to see this feature implemented, as within my bug bounty hunting automation I used something similar like this:

subdomains_live.txt | ~/go/bin/httpx -sc -title -cl -wc -td -ip > subdomains_httpx_live.txt
subdomains_httpx_live.txt | awk -F"[" '!seen[$2, $3, $4, $5, $6]++ {print $1}' > subdomains_deduped.txt

Looking at how it is currently implemented, I have the feeling that a lot of potential good targets are going to be missed out on. I believe there are many servers/applications out there that will respond exactly the same. Although there might be completely different content to be found.

So in my own automation I also included to look at the IP address. Because it is very likely if it is also from the same server IP address and the responses are exactly the same that it is actually just different subdomains pointing to the same webapplication.

With this I went from 5000 subdomains to just 500, which improves my chances to actually find something good. Because less is more in this case. Further analysis or automated scanning will take much less time (10% of the time) and you're not overwhelmed with all the duplicate results.

It might be a good idea to include this as well, or maybe there are others that have even better ideas.

projectdiscovery / httpx

Duplicate filter feature ideas (-fd, -filter-duplicates) #2014