projectdiscovery / httpx

httpx is a fast and multi-purpose HTTP toolkit that allows running multiple probes using the retryablehttp library.
https://docs.projectdiscovery.io/tools/httpx
MIT License
7.62k stars 828 forks source link

Parallelize threads on many hosts #904

Closed Kiblyn11 closed 1 year ago

Kiblyn11 commented 1 year ago

Please describe your feature request:

It seems that httpx does use multithreading but on one target host at a time.

Describe the use case of this feature:

It would be nice to parallelize threads on multiple hosts to:

Mzack9999 commented 1 year ago

@Kiblyn11 httpx is already multithread towards many different hosts. Different paths to the same hosts are processed in separate threads. Unless I'm mistaken, this functionality is already implemented. Can you elaborate on which limitations you are facing?

Kiblyn11 commented 1 year ago

@Mzack9999 I understand my message was not very clear.

Let me illustrate one case. I want to recon on 2 hosts ffuf style. hosts file contains 2 entries, http://127.0.0.1:8443 and http://127.0.0.1:8444. The 2 servers do actually serves the same folder on my laptop. Wordlist /tmp/test is a raft modified wordlist to add '/' in front of any entry, else httpx won't work. I ran the following command for a few minutes and stopped manually and got that result:

cat hosts| ./httpx -path /tmp/test

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/              v1.2.5

                projectdiscovery.io

Use with caution. You are responsible for your actions.
Developers assume no liability and are not responsible for any misuse or damage.
http://127.0.0.1:8443/subscription.php
http://127.0.0.1:8443/xmlrpc.php
http://127.0.0.1:8443/UPGRADE.txt
http://127.0.0.1:8443/online.php
http://127.0.0.1:8443/member.php
http://127.0.0.1:8443/sendmessage.php
http://127.0.0.1:8443/showgroups.php
http://127.0.0.1:8443/faq.php
http://127.0.0.1:8443/LICENSE.txt
http://127.0.0.1:8443/contact.txt
http://127.0.0.1:8443/credentials.txt

You can see that it reported only for host 127.0.0.1:8443, even though 127.0.0.1:8444 serves the same content. Looking at the 8444 server log, httpx never sent a request to it.

So I deduct that multithrading is per host. Here it is default to 50 threads which are in use for enumerating paths on first hosts, but second host will wait for first host to complete and so on.

What I believe would be an improvement, and it is actually done in FFUF, would be to parallelize threads to multiple hosts, and not wait for one host to complete before moving on the next one.

Mzack9999 commented 1 year ago

@Kiblyn11 Thanks for providing more details. httpx really use a simple iteration logic that enqueues items serially as the primary tool's purpose is to process many different hosts for a few paths. Have you considered using nuclei for this path enumeration? It seems a perfect fit since it processes hosts in parallel, and you can define your templates by customizing the request without the need to preprocess any wordlist.

The iteration process is just putting all the combinations of hosts and paths in a queue. The worker threads pull in FIFO mode from the queue and elaborate the targets independently, so most likely, the first target is processed first, then the second, and so on. Adding a shuffle option could help in better distributing unresponsive hosts. At the end of the elaboration, httpx cannot connect to the second target, 127.0.0.1:8444?

Kiblyn11 commented 1 year ago

@Mzack9999 The advantages of using httpx over nuclei were the simplicity and all the integrated probes. As I see it, to be interesting it would have to process hosts and path as:

HOST1/PATH1
HOST2/PATH1
HOST3/PATH1
HOST1/PATH2
HOST2/PATH2
HOST3/PATH2
HOST1/PATH3
...

In case we have 3 hosts.

It would need to redo or add another queuing method indeed. I think adding another queuing method might be interesting as some people might need the actual queuing way.