zaproxy / zaproxy

The ZAP by Checkmarx Core project
https://www.zaproxy.org
Apache License 2.0
12.69k stars 2.27k forks source link

ZAP Spider Infinite Loop #3946

Open claudijd opened 7 years ago

claudijd commented 7 years ago

I was doing a mass scan of some of our web properties and managed to run into an infinite loop with the Spider. It would run and run and run, and be stuck on 99% and eventually ran one of my scanners out of disk.

I suspect there has to be some sort of infinite loop detection or maximum number spider requests that can be handed off to the ZAP config to prevent this from happening, just curious what that might be and also sort of share their bizzare case.

I, unfortunately, don't have the logs from this scan anymore, because my artifact buffer was overrun by subsequent jobs, but I can tell you the web property was https://website-archive.mozilla.org/.

If it would be helpful that I re-perform the scan and demonstrate the behavior or get some additional logging output, let me know and I can do that.

thc202 commented 7 years ago

Was that with default settings?

claudijd commented 7 years ago

It was with the default config using a Docker zap-full-scan.py run, looked something like this as an invocation:

docker run -v "$(pwd)":/zap/wrk/:rw --dns 8.8.8.8 -u root -i owasp/zap2docker-weekly zap-full-scan.py -d -c ./zap/config.cfg -r report_example.com.html -t https://example.com/
claudijd commented 7 years ago

And to be clear, when I say "mass scan" I mean a large scan, but using individual docker runs, like the syntax I describe above. Not the community mass scan strategy.

thc202 commented 7 years ago

Is the site static? Or, new "pages" can be created while spidering?

claudijd commented 7 years ago

@thc202 the site acts as a web archive, but I cannot tell if it allows dynamic generation of content while crawling.

thc202 commented 7 years ago

Is it fine to run the spider against that site (to try debug the issue)?

claudijd commented 7 years ago

I could run a scan for you from an authorized host, to avoid getting blocked, if that would help. Just give me the params you'd like and I can make it so.

claudijd commented 7 years ago

Example:

    2017-10-20 02:51:05,129 Target: https://website-archive.mozilla.org/
    2017-10-20 02:51:05,129 Using port: 35155
    2017-10-20 02:51:05,130 Starting ZAP
    Oct 20, 2017 2:51:11 AM java.util.prefs.FileSystemPreferences$1 run
    INFO: Created user preferences directory.
    2017-10-20 02:51:36,409 ZAP Version D-2017-10-16
    2017-10-20 02:51:36,409 Took 31 seconds
    2017-10-20 02:51:40,846 Spider https://website-archive.mozilla.org/
    2017-10-20 02:51:45,948 Spider progress %: 59
    2017-10-20 02:51:50,987 Spider progress %: 99
    2017-10-20 02:51:55,997 Spider progress %: 99
    2017-10-20 02:52:01,035 Spider progress %: 99
    2017-10-20 02:52:06,065 Spider progress %: 99
    2017-10-20 02:52:11,103 Spider progress %: 99
    2017-10-20 02:52:16,120 Spider progress %: 99
    2017-10-20 02:52:21,127 Spider progress %: 99
    2017-10-20 02:52:26,139 Spider progress %: 99
    2017-10-20 02:52:31,152 Spider progress %: 99
    2017-10-20 02:52:36,161 Spider progress %: 99
    2017-10-20 02:52:41,171 Spider progress %: 99
    2017-10-20 02:52:46,180 Spider progress %: 99
    2017-10-20 02:52:51,195 Spider progress %: 99
    2017-10-20 02:52:56,201 Spider progress %: 99
    2017-10-20 02:53:01,209 Spider progress %: 99
    2017-10-20 02:53:06,220 Spider progress %: 99
    2017-10-20 02:53:11,229 Spider progress %: 99
    2017-10-20 02:53:16,242 Spider progress %: 99
    2017-10-20 02:53:21,252 Spider progress %: 99
claudijd commented 7 years ago

@thc202 happens in the first min of the scan

avd1989 commented 1 year ago

Currently getting the same error. It keeps on adding the same URL to the URLs Found list. Any ideas?

kingthorin commented 1 year ago

What error?

avd1989 commented 1 year ago

The spider is looping at 99% just adding many times the same URL

kingthorin commented 1 year ago

Okay so there’s no error just a problematic behaviour you can’t seem to sort out.

If you diff some of the requests/responses are they 100% the same?

disenchant commented 9 months ago

I have the same or at least very similar behaviour and had a closer look on what's happening (at least in my specific case):

The spider gets an infinite amount of pages, all located under https://[domain]/createdby/[random string] (yes I know it's a token and not just a random string but for the case at hand it could as well be a random string 😉 )

CleanShot 2024-01-18 at 17 12 25@2x

When looking at those pages and comparing them to each other we see that they are like 99% the same but each contains a link to a page with a new random string:

CleanShot 2024-01-18 at 17 16 28@2x

This behaviour of the application results in ZAP thinking that every time it loads such a page, that it found yet another new link/page, will continue to crawl that one, and so on until the end of time.

My proposed solution to this kind of problem would be to add some similarity check of responses (e.g. implemented by fuzzy hashing) so ZAP could detect when two pages are nearly exactly the same. Having the level of similarity configurable would allow for handling such small differences as well as making sure that when scanning e.g. a web shop not every single product is considered a completely new page and then also passed to the active scanner as ZAP would see that those pages even as they are somewhat different from a content perspective are essentially the same and therefore not worth to perform further analysis.