zmap / zgrab2

Fast Go Application Scanner
Other
1.77k stars 309 forks source link

ZGrab performance issue with http? #452

Open phillip-stephens opened 5 months ago

phillip-stephens commented 5 months ago

Not a bug per se, but was investigating a slow zgrab scan and noticed that it maxes out at sending 6 MB/s. While there might be a lot of CPU overhead, this just seems egregiously slow and surely there's some performance gains we can achieve. This issue is to track that.

image

CLI command - cat ~/100k-domains.txt | ./zgrab2 http --max-redirects=3

This was run on a VM with a 1 Gb/s + link and plenty of cores(22)/RAM (56 GB)

mzpqnxow commented 4 months ago

Not a bug per se, but was investigating a slow zgrab scan and noticed that it maxes out at sending 6 MB/s. While there might be a lot of CPU overhead, this just seems egregiously slow and surely there's some performance gains we can achieve. This issue is to track that.

image

CLI command - cat ~/100k-domains.txt | ./zgrab2 http --max-redirects=3

This was run on a VM with a 1 Gb/s + link and plenty of cores(22)/RAM (56 GB)

Does using the "senders" global (-s if I remember correctly..) not help with this?

Regardless, it seems like a rather unusual number as a default... unless it's as a ratio to the number of logical cores. If that's the case, it does in fact seem to be very low, assuming your measurement is correct <-- misread as "6 connections", disregard this

In the past, users have created issues about performance that were ultimately caused by DNS lookups. I suspect that may be the case here. The spikes in the graph may represent time doing DNS lookups

I've managed to avoid DNS issues by pre-resolving targets, then using the targets file format with both the Virtual Host/FQDN and IP address, so DNS is only needed on redirects. This is necessary anyway if you want to cover all of the IP addresses for load-balanced FQDNs that return more than one IP in the A record lookup

In my opinion, zgrab2 should not be relied upon for DNS lookups, because of this exact issue - on the fly DNS lookups can significantly slow down the actual scanning/probes. I don't know if zmap project folks agree, but I've become pretty firm in believing it

Seanstoppable commented 3 months ago

I recently noticed that we do DNS lookups whenever we get a redirect, irrespective of if we are going to follow it, which is probably at least part of the problem.

Seanstoppable commented 3 months ago

I've tracked this to this line: https://github.com/zmap/zgrab2/pull/462/files#diff-dddc8e7eb09019004b9ea48a345b3fb794995c7b661e5e2aa0494d56d6ce7bd3L542 Where we go through and create a new request, parsing the DNS name and triggering a DNS lookup before evaluating whether we want to bail out due to redirects. Options I see are moving out the redirect check to right after the request lookup (per the PR), or I guess we could move it sooner in the loop, before we go to create a new request object/do the parsing.

mzpqnxow commented 3 months ago

I recently noticed that we do DNS lookups whenever we get a redirect, irrespective of if we are going to follow it, which is probably at least part of the problem.

Indeed, that's the residual bit that remains, even if you do provide all of the initial addresses. And I'm not sure of any way around it

Though I haven't actually measured it, I would assume it's a minor blip when compared with most target lists. Or maybe not, I haven't properly thought about it

mzpqnxow commented 3 months ago

I've tracked this to this line:

https://github.com/zmap/zgrab2/pull/462/files#diff-dddc8e7eb09019004b9ea48a345b3fb794995c7b661e5e2aa0494d56d6ce7bd3L542

Where we go through and create a new request, parsing the DNS name and triggering a DNS lookup before evaluating whether we want to bail out due to redirects.

Options I see are moving out the redirect check to right after the request lookup (per the PR), or I guess we could move it sooner in the loop, before we go to create a new request object/do the parsing.

One question- do you know off-hand what happens for a case like http://site.com redirect -> https://site.com?

Specifically, wondering if it knows that it can use that same address or it does another lookup

I imagine that is a good portion of the redirects for most target lists

mzpqnxow commented 3 months ago

One more note: if you do tackle this, keep in mind the potential pitfall mentioned in #307 which is really a general pattern that has manifested in different ways/places- TLS handshake failures, redirect exceeded, etc. I think your recent PR may have fixed this sort of behavior, just wanted to mention it

I'm not sure I ever sent a PR for the TLS handshake case 😞

Seanstoppable commented 3 months ago

My use case is a bit simpler, as a lot of my scans are just IPs and not domains So any redirect to a domain in that case does a DNS lookup In the case of cat ~/100k-domains.txt | ./zgrab2 http --max-redirects=3, you are going to do up to 5 lookups in the worst case: 1 for the initial domain, 3 for the up to 3 redirects, and a 4th if the last redirect also redirects, even if you don't follow it or use that data.

One question- do you know off-hand what happens for a case like http://site.com/ redirect -> https://site.com/?

I would expect it to just be cached locally, rather than zgrab2 trying to re-resolve everything. http://site.com/ -> http://www.site.com/ would trigger another lookup though.

However, I don't think the golang code will cache those internally, so maybe a small improvement if we cached those for the http -> https redirect we are expecting when we start at http, avoiding a roundtrip to the system resolver even if the value is cached there.

Indeed, that's the residual bit that remains, even if you do provide all of the initial addresses. And I'm not sure of any way around it

Yeah. I am not sure how to solve that one. Iterative process of doing one level at a time is a lot of wrapping/stitching together. I guess you could try to embed something like ZDNS inside (and maybe this is why there has been work to make it more library work) to do faster lookups than what we are doing today.

I wonder if part of it is also redirects like domain.com -> www.domain.com, where that doesn't take advantage of multiplexing. I wonder if you could be a little 'clever' and try to get the roundtripper to maintain a connection per IP address, rather than per a hostname.

mzpqnxow commented 3 months ago

My use case is a bit simpler, as a lot of my scans are just IPs and not domains So any redirect to a domain in that case does a DNS lookup In the case of cat ~/100k-domains.txt | ./zgrab2 http --max-redirects=3, you are going to do up to 5 lookups in the worst case: 1 for the initial domain, 3 for the up to 3 redirects, and a 4th if the last redirect also redirects, even if you don't follow it or use that data.

I see. So the initial request for our cases is similar (profile-wise) because no DNS lookup required, but your case won’t necessarily benefit at all from caching DNS from that first request (since there isn’t one)

Actually, let me correct myself. I just realized that my target list could actually prime a cache. Since I provide, for example:

1.2.3.4, target.com
3.4.5.6, www.target.com




 it should be possible to first reference the name to IP mappings in the target list, before making a DNS request. Wouldn’t help you, though. I wonder how much of an impact this would have for me. As a mapping with writes occurring only at startup, it wouldn’t need any locking. The lookups should be cheap. Just a matter of how many cases would actually get a hit, which would vary for each user’s target set

I would expect it to just be cached locally, rather than zgrab2 trying to re-resolve everything. http://site.com/ -> http://www.site.com/ would trigger another lookup though.

I would like to confirm that if I get time

However, I don't think the golang code will cache those internally, so maybe a small improvement if we cached those for the http -> https redirect we are expecting when we start at http, avoiding a roundtrip to the system resolver even if the value is cached there.

Interesting idea. Could be a quick win

Yeah. I am not sure how to solve that one. Iterative process of doing one level at a time is a lot of wrapping/stitching together. I guess you could try to embed something like ZDNS inside (and maybe this is why there has been work to make it more library work) to do faster lookups than what we are doing today

In this case, are you describing making the same number of DNS requests, but with more optimized code? If so, I would be skeptical of that, mainly because the delay is probably i/o bound. For my dataset, I’m aware of a large number of domains that have very high latency DNS servers. I may be misunderstanding what you’re describing, though

I wonder if part of it is also redirects like domain.com -> www.domain.com, where that doesn't take advantage of multiplexing. I wonder if you could be a little 'clever' and try to get the roundtripper to maintain a connection per IP address, rather than per a hostname.

That could work, as long as it’s HTTP and not HTTPS. It will break on some (possibly many) of the TLS >= 1.2 cases, because of SNI, which will require a new TLS negotiation with the appropriate/new name

Have you considered globally caching DNS across all of the senders? Maybe that’s what you were describing with your idea about zdns “embedding”?

A global cache was one if my first thoughts, but I immediately started arguing points against myself

First, it may help me, but it wouldn’t help all target sets. Only those that have a lot of repeated names over the duration of the session, obviously

In practice, my data tends to have a lot of those, because all of my targets are ultimately assets under the same organization/entity. So I see a lot of cases like:

  1. “Parked” names / defensive registrations. Like “thecompanyname.com”, “mycompanyname.com”, “thecompanynameisbad.com”,“companyname2024.com”, 
 which all may redirect to “www.companyname.com”. In some cases there are literally hundreds of names that redirect to one site. And many such sites.
  2. Sites with redirect-based SSO. Like “app.companyname.com”, “app2.companyname.com”, “app3.companyname.com”, 
 all redirecting to sso.companyname.com

These “feel” very common for my target list but I have to look at the data to gauge the magnitude, maybe by adding simple logging of all DNS requests and doing a uniq -c to see how many repeated. I could also do a wire capture, I guess

Caching globally would also have sharply diminishing returns for those whose target sets don’t consist primarily of targets with the same organizational “owner”, though. Which may be most users

Caching globally might also have potential locking issues impacting performance, if the requests were performed by each sender, but there must be some way to avoid a lock for every new lookup, or reduce the contention. It wouldn’t be an issue if the DNS was a dedicated thread, though

Just thinking out loud here, I haven’t really given any of this proper thought. And it’s early, I probably misunderstood a few of your thoughts- hopefully not all of them 😊

Very happy to have you thinking about this, even though our use patterns may differ quite a bit. I’ve contributed a lot of bug fixes and minor features, but my golang experience and knowledge is sorely lacking. I have to defer to others on any non-trivial implementations, or anything too deep into the core, which is frustrating. I’m used to being more “hands-on” helpful

Anyway, thanks for the discussion, let me know if there’s anything I can do to help in the way of profiling or more primitive data collection

Seanstoppable commented 2 months ago

So, I was doing unrelated work with DNS, and I noticed that the TTLs for a lot of sites are really low Low enough that even if you are initially resolving something and putting it into a cache that it will potentially expire, and you'll generate a cache miss and wait for the resolver to re-resolve. In that case I think there are really only two solutions: 1) Actually construct some sort of cache in zgrab itself 2) Run your own resolver that allows overriding the ttl to something more palatable for the type of scan you are using. This is probably saner than trying to cache everything in zgrab, and easier to fine tune.

mzpqnxow commented 1 month ago

So, I was doing unrelated work with DNS, and I noticed that the TTLs for a lot of sites are really low

Low enough that even if you are initially resolving something and putting it into a cache that it will potentially expire, and you'll generate a cache miss and wait for the resolver to re-resolve.

In that case I think there are really only two solutions:

1) Actually construct some sort of cache in zgrab itself

2) Run your own resolver that allows overriding the ttl to something more palatable for the type of scan you are using. This is probably saner than trying to cache everything in zgrab, and easier to fine tune.

Might it make sense to "lock" the DNS response after it is received? Meaning, cache the answer indefinitely, until the process ends - ignoring the TTL, and never repeating a DNS lookup?

Are there any/many use-cases where the user wants to go to multiple addresses for a given name during one session of zgrab2? I mean the example case where "www.test.com" is queried once, the response has a TTL of 360, and 10 minutes later a request to "home.test.com", which redirects to www.test.com. Do most users really want or expect the second request to (potentially) go to a different address?

If I'm thinking about this correctly, this case would only ever happen for one of two reasons:

  1. The TTL expired AND A record was changed by zone owner between two requests to the same DNS name (302 from a different site, or maybe a singe target being present in one target list but with different ports)
  2. The TTL expired AND the A record is round-robin, not sticky to client IP

The former may be more of an edge-case most of the time, comparatively to the latter, even with your observation regarding aggressive TTLs; the latter is probably more common and, while honoring the new address is "correct", it can result in inconsistent results, specifically if LB farms are in a non-consistent configuration. An interesting thing to identify, for sure, but there are better (deterministic, comprehensive) ways to identify those, using methods that don't involve lucky timing and low TTLs

My use-case is pretty narrow- a predetermined list of targets specified by a name and IP tuple; the process running for 6-8 hours at most; encountering lots of 301/302.

Really simple caching of the first response forever (until the process terminates) would work just fine for me

... not everyone, maybe. And I expect there to be a few reasons that this is a terrible idea for others, but I'm not too familiar with how others use the tool

With that selfish statement, what do you think about a flag like --sticky-dns (or whatever) that prevents the IP of an FQDN from changing during a session, and the necessary changes to the code to make this effectively prevent duplicate DNS lookups?

A name would be looked up only once during the lifetime of a process. What you propose (resolver with TTL override) would provide this capability as well, but the simpler behavior could be implemented with leas effort. I'm not saying this more limited change would solve the issue, but it could be the first incremental change be a step towards the comprehensive solution you had in mind, unless you think it's too hacky

Because the simpler/limited case I'm talking about is really just adding a hash table and instrumenting the points where DNS lookups might occur (to either plug in or store the answer) it might even be something I could contribute with limited time and sub-par go-lang knowledge/experience

The only potential issue I see with the implementation is locking the hash table access. I'm not sure that any of the threads currently share writeable objects, but I do recall that golang maps need explicit locks when writing from one threat and reading concurrently from other threads. It's trivial to add a defer to do the lock, but I wonder how that will impact performance at scale

Any thoughts on why this is/isn't worth pursuing, or anything I may have gotten wrong are very welcome if anyone has time to review this

mzpqnxow commented 1 month ago

Maybe running an out-of-process local caching resolver (a la dnsmasq) is the better answer, it might not help as fast as in-process, but it would be simpler and less invasive for sure

mzpqnxow commented 1 month ago

Shoutout to #104 which I just realized discussed this (though it was a very shallow discussion)

😀