projectdiscovery / httpx

httpx is a fast and multi-purpose HTTP toolkit that allows running multiple probes using the retryablehttp library.
https://docs.projectdiscovery.io/tools/httpx
MIT License
7.81k stars 847 forks source link

Duplicate filter feature ideas (-fd, -filter-duplicates) #2014

Open Xitro01 opened 2 days ago

Xitro01 commented 2 days ago

Hello there,

I was surprised to see this feature implemented, as within my bug bounty hunting automation I used something similar like this:

subdomains_live.txt | ~/go/bin/httpx -sc -title -cl -wc -td -ip > subdomains_httpx_live.txt
subdomains_httpx_live.txt | awk -F"[" '!seen[$2, $3, $4, $5, $6]++ {print $1}' > subdomains_deduped.txt

Looking at how it is currently implemented, I have the feeling that a lot of potential good targets are going to be missed out on. I believe there are many servers/applications out there that will respond exactly the same. Although there might be completely different content to be found.

So in my own automation I also included to look at the IP address. Because it is very likely if it is also from the same server IP address and the responses are exactly the same that it is actually just different subdomains pointing to the same webapplication.

With this I went from 5000 subdomains to just 500, which improves my chances to actually find something good. Because less is more in this case. Further analysis or automated scanning will take much less time (10% of the time) and you're not overwhelmed with all the duplicate results.

It might be a good idea to include this as well, or maybe there are others that have even better ideas.

GeorginaReeder commented 2 days ago

Thanks so much for your feature request @Xitro01 ! We'll take a look into this :)