Spider fails when invoked via the API

GoogleCodeExporter commented 9 years ago

Run the BodgeIt store.
Access http://localhost:8080/bodgeit/ while proxying via ZAP
Start the Spider via the API, eg using 
http://zap/JSON/spider/action/scan/?zapapiformat=JSON&url=http%3A%2F%2Flocalhost
%3A8080%2Fbodgeit%2F
API returns: {"Result":"OK"}

But nothing is spidered :(
From the logs:

226807 [ZAP-ProxyThread] DEBUG org.zaproxy.zap.extension.api.API  - 
handleApiRequest 
http://zap/JSON/spider/action/scan/?zapapiformat=JSON&url=http%3A%2F%2Flocalhost
%3A8080%2Fbodgeit%2F
226807 [ZAP-ProxyThread] DEBUG org.zaproxy.zap.extension.spider.SpiderAPI  - 
Request for handleApiAction: scan (params: 
{"zapapiformat":"JSON","url":"http://localhost:8080/bodgeit/"})
226807 [ZAP-ProxyThread] DEBUG org.zaproxy.zap.extension.spider.SpiderAPI  - 
API Spider scanning url: http://localhost:8080/bodgeit/
226808 [ZAP-ProxyThread] DEBUG org.parosproxy.paros.model.SiteMap  - findChild 
Sites / http://localhost:8080
226808 [ZAP-ProxyThread] DEBUG org.parosproxy.paros.model.SiteMap  - findChild 
http://localhost:8080 / bodgeit
226808 [ZAP-ProxyThread] DEBUG org.zaproxy.zap.extension.spider.SpiderThread  - 
Initializing spider thread for site: API
226808 [ZAP-ProxyThread] DEBUG org.zaproxy.zap.extension.api.API  - 
handleApiRequest returning: {"Result":"OK"}
226808 [Thread-2178] INFO org.zaproxy.zap.extension.spider.SpiderThread  - 
Starting spidering scan on API at Tue Mar 26 18:15:41 GMT 2013
226808 [Thread-2178] INFO org.zaproxy.zap.spider.Spider  - Spider 
initializing...
226808 [Thread-2178] DEBUG org.zaproxy.zap.spider.Spider  - New Exclude list: 
null
226808 [Thread-2178] INFO org.zaproxy.zap.spider.Spider  - Starting spider...
226808 [Thread-2178] INFO org.zaproxy.zap.spider.Spider  - Adding seed for 
spider: http://localhost:8080/bodgeit
226808 [Thread-2178] DEBUG org.zaproxy.zap.spider.SpiderTask  - New task 
submitted for uri: http://localhost:8080/bodgeit
226809 [pool-3-thread-1] DEBUG org.zaproxy.zap.spider.SpiderTask  - Spider Task 
Started. Processing uri at depth 0 using already constructed message:  
http://localhost:8080/bodgeit
226810 [pool-3-thread-1] DEBUG org.parosproxy.paros.network.HttpSender  - 
sendAndReceive GET http://localhost:8080/bodgeit start
226810 [pool-3-thread-1] DEBUG 
org.zaproxy.zap.extension.httpsessions.HttpSessionsSite  - Matching session for 
request message (for site localhost:8080): null
226810 [pool-3-thread-1] DEBUG 
org.zaproxy.zap.extension.httpsessions.HttpSessionsSite  - Session of request 
message is the same as the active session, so no request changes needed.
226811 [pool-3-thread-1] DEBUG org.parosproxy.paros.network.HttpSender  - 
sendAndReceive GET http://localhost:8080/bodgeit took 1
226812 [pool-3-thread-1] DEBUG org.parosproxy.paros.model.SiteMap  - addPath 
http://localhost:8080/bodgeit
226812 [pool-3-thread-1] DEBUG org.parosproxy.paros.model.SiteMap  - 
findAndAddChild Sites / http://localhost:8080
226812 [pool-3-thread-1] DEBUG org.parosproxy.paros.model.SiteMap  - findChild 
Sites / http://localhost:8080
226812 [pool-3-thread-1] DEBUG org.parosproxy.paros.model.SiteMap  - 
findAndAddLeaf http://localhost:8080 / bodgeit
226812 [pool-3-thread-1] DEBUG org.parosproxy.paros.model.SiteMap  - findChild 
http://localhost:8080 / GET:bodgeit
226812 [pool-3-thread-1] DEBUG org.zaproxy.zap.spider.filters.ParseFilter  - 
Resource is not text: http://localhost:8080/bodgeit
226812 [pool-3-thread-1] DEBUG org.zaproxy.zap.spider.SpiderTask  - Resource 
fetched, but will not be parsed due to a ParseFilter rule: 
http://localhost:8080/bodgeit
226812 [pool-3-thread-1] INFO org.zaproxy.zap.spider.Spider  - Spidering 
process is complete. Shutting down...
226812 [pool-3-thread-1] DEBUG org.zaproxy.zap.spider.SpiderTask  - Spider Task 
finished.
226813 [Thread-2179] INFO org.zaproxy.zap.extension.spider.SpiderThread  - 
Spider scanning complete: true

Original issue reported on code.google.com by psii...@gmail.com on 26 Mar 2013 at 6:20

GoogleCodeExporter commented 9 years ago

I've gone back to the weekly release from February 4th and still see this 
issue. Not sure if there's something wrong with my implementation or if this 
API has been broken for a while...

Original comment by dave.h...@gmail.com on 26 Mar 2013 at 7:45

GoogleCodeExporter commented 9 years ago

Does this happen only on bodgeit store? Because I tested it with a couple of 
other websites and it seems like it's working... 

Apparently, from the logs, the response returned by querying: 
http://localhost:8080/bodgeit doesn't seem to be text...

Original comment by cosminst...@gmail.com on 26 Mar 2013 at 10:52

GoogleCodeExporter commented 9 years ago

It does happen on other sites - Dave found the problem when trying to spider a 
Mozilla site.

Does the spider follow 302 redirections?
Because the response to GET http://localhost:8080/bodgeit is
HTTP/1.1 302 Found
Server: Apache-Coyote/1.1
Location: http://localhost:8080/bodgeit/
Date: Wed, 27 Mar 2013 08:54:01 GMT
Content-length: 0

While the one to GET http://localhost:8080/bodgeit/ (note the trailing slash) is
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=F4A23C270579454C60EE56AF323CE69A; Path=/bodgeit/; 
HttpOnly
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 3171
Date: Wed, 27 Mar 2013 08:52:34 GMT
etc...

And although I specify http://localhost:8080/bodgeit/ in the API call it looks 
like the spiders trying to access http://localhost:8080/bodgeit

Original comment by psii...@gmail.com on 27 Mar 2013 at 8:56

GoogleCodeExporter commented 9 years ago

I've attached a zap.log file of a job that was using the API to try to spider 
the Mozilla Marketplace dev site. It never returns from the API call... In 
other instances (such as Bodge It) it returns quickly but doesn't appear to 
have spidered at all. The two issues may be unrelated.

Original comment by dave.h...@gmail.com on 27 Mar 2013 at 11:58

Attachments:

zap.log

GoogleCodeExporter commented 9 years ago

The active scan is also failing for me using the API with very similar 
behaviour. Could this be related or would you prefer me to open a new issue?

Original comment by dave.h...@gmail.com on 28 Mar 2013 at 12:48

GoogleCodeExporter commented 9 years ago

It might be related as the scanner will only scan the pages that have been 
accessed, either manually or with the spider. If your use case depends on the 
pages found by the spider (and the spider is not spidering) then most likely is 
related to this issue(s). If it doesn't depend on the spider it would be better 
to create a new issue.

I've raised two issues (Issue 582 and Issue 583) to fix the exceptions that 
were logged in the previous attached log file (comment #4) (those issues do not 
affect this issue, though).

Original comment by THC...@gmail.com on 30 Mar 2013 at 1:37

GoogleCodeExporter commented 9 years ago

I've just committed the changes for proper handling of HTTP redirection by the 
spider (r3020). Thanks for pointing this out, Dave.

Original comment by cosminst...@gmail.com on 7 Apr 2013 at 11:12

Changed state: Committed

GoogleCodeExporter commented 9 years ago

Original comment by psii...@gmail.com on 8 Apr 2013 at 8:07

Added labels: Build-20130408

GoogleCodeExporter commented 9 years ago

I can confirm that this works for me in the latest weekly release.

Original comment by dave.h...@gmail.com on 15 Apr 2013 at 9:10

GoogleCodeExporter commented 9 years ago

\o/

Original comment by psii...@gmail.com on 15 Apr 2013 at 9:11

GoogleCodeExporter commented 9 years ago

Fixed in 2.1.0

Original comment by psii...@gmail.com on 18 Apr 2013 at 9:49

Changed state: Fixed

intruxxer / zaproxy

Spider fails when invoked via the API #574