webkeonsanjeev / skipfish

Automatically exported from code.google.com/p/skipfish
Apache License 2.0
0 stars 0 forks source link

crawler of skipfish missing lot of links #157

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
hi, i m facing a problem with the crawler as it is missing lot of links 

example 
in a single page application consider a situation in which different pages get 
included for same id : 

index.php?id=1 leads to one.php page //that have some new links
index.php?id=2 leads to two.php page //that have different links

in this situation index.php?id=2 get completely ignored , so we loose all the 
links on page two.php

Original issue reported on code.google.com by anuragno...@gmail.com on 19 Jul 2012 at 11:23

GoogleCodeExporter commented 8 years ago
That's odd, thanks for reporting!  

One weakness of the crawler is Javascript support. If these links are 
dynamically added to the page, than that might be the cause. Based on your 
description, I think this is not the case. 

Can you perhaps send me a debug log ? (make debug) Or could you give some 
larger page snippets ?

Original comment by niels.he...@gmail.com on 19 Jul 2012 at 12:18

GoogleCodeExporter commented 8 years ago
kindly find the file attached.
index.html has three links link1.php?id=1, link1.php?id=2, link1.php?id=3.

link1.php includes three different file on the basis parameter id, it includes 
file1.php, file2.php, file3.php respectivily.

In this scenario contents of file2.php and file3.php will not crawled, as 
skipfish crawler will ignore link1.php?id=2, link1.php?id=3

Original comment by anuragno...@gmail.com on 20 Jul 2012 at 6:04

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for the files! Can you perhaps give me the debug output as well ? 

It might be the case that the parameter is marked as bogus. This can 
incorrectly happen when the different parameter values result in "hardly 
noticeable" difference. By design we allow small difference to occur while 
still marking a page as similar / same. 

(not saying this isn't something that can't be fixed. So debug output would 
help)

Original comment by niels.he...@gmail.com on 21 Jul 2012 at 4:11

GoogleCodeExporter commented 8 years ago
kindly find the debug output:

== PIVOT DEBUG ==

== Pivot [root] [0] ==
Type     : PIVOT_ROOT
State    : PSTATE_DONE
Flags    : linked 2, case 0/0, fuzz_par -1, ips 0, sigs 0, reqs 0, desc 1/4

 == Pivot http://skipfish.test.web/ [0] ==
 Type     : PIVOT_SERV
 State    : PSTATE_CHILD_INJECT
 Flags    : linked 2, case 0/1, fuzz_par -1, ips 0, sigs 0, reqs 0, desc 2/3
 Target   : http://skipfish.test.web/ (200)
 MIME     : text/html -> - [UTF-8:-]
 -> Issue : type 20201, extra 'no distinctive 404 behavior detected', URL: / (200)

  == Pivot link1.php [0] ==
  Type     : PIVOT_FILE
  State    : PSTATE_CHILD_INJECT
  Flags    : linked 2, case 0/0, fuzz_par -1, ips 0, sigs 0, reqs 0, desc 1/1
  Target   : http://skipfish.test.web/link1.php (200)
  MIME     : text/html -> - [UTF-8:-]
  -> Issue : type 10901, extra '(null)', URL: /link1.php (200)

   == Pivot id [0] ==
   Type     : PIVOT_PARAM
   State    : PSTATE_PAR_INJECT
   Flags    : linked 2, case 0/0, fuzz_par 1, ips 0, sigs 1, reqs 0, desc 0/0
   Target   : http://skipfish.test.web/link1.php?id=1 (200)
   MIME     : text/html -> - [UTF-8:-]
   Try      : 1, 2, 3

  == Pivot abc.php [0] ==
  Type     : PIVOT_PATHINFO
  State    : PSTATE_CHILD_INJECT
  Flags    : linked 2, case 0/0, fuzz_par -1, ips 0, sigs 0, reqs 0, desc 0/0
  Target   : http://skipfish.test.web/abc.php/ (200)
  MIME     : text/html -> - [UTF-8:-]

== END OF DUMP ==

Original comment by anuragno...@gmail.com on 23 Jul 2012 at 6:06

GoogleCodeExporter commented 8 years ago
could you please guide me, how i can fix this.

Original comment by anuragno...@gmail.com on 24 Jul 2012 at 5:18

GoogleCodeExporter commented 8 years ago
Is there any other information required from my side?

Original comment by anuragno...@gmail.com on 27 Jul 2012 at 4:42

GoogleCodeExporter commented 8 years ago
Ah sorry for the delay. I'm currently traveling but will check back next week. 
If you by any chance could send me the full debug.log file (feel free to just 
email it) than that would be great!

Original comment by niels.he...@gmail.com on 28 Jul 2012 at 1:14

GoogleCodeExporter commented 8 years ago
Without the full debug log I'm unable to look into this. I'm aiming to release 
the next version next week and it has crawler changes which also affect the 
"duplicate" marking of parameters.   

If you:

$ grep "link" outputdir/pivots.txt 

Can you give me the output ? Can you also just mail me the "debug.log" file ? 
(e.g. "make debug; ./skipifish [...options..] 2> debug.log ")

Original comment by niels.he...@gmail.com on 7 Aug 2012 at 11:46

GoogleCodeExporter commented 8 years ago

With 2.08b out, I'm assuming this is fixed.  Can you please verify ? 

Cheers,
Niels

Original comment by niels.he...@gmail.com on 1 Sep 2012 at 6:56

GoogleCodeExporter commented 8 years ago
Hi,
Sorry for the delay response.
It is not fixed , i have attached debug.txt,pivot.txt please look into it.
thanks

Original comment by anuragno...@gmail.com on 10 Sep 2012 at 7:08

Attachments:

GoogleCodeExporter commented 8 years ago
Fixed -> New. I'll review the logs, thanks for attaching them!

Original comment by niels.he...@gmail.com on 12 Sep 2012 at 3:21

GoogleCodeExporter commented 8 years ago
Problem reproduced and it's due to us not scraping the response due to 
similarities with previous requests. 

Can you try the attached patch ? It should fix the issue.

Thanks!
Niels

Original comment by niels.he...@gmail.com on 13 Sep 2012 at 9:32

Attachments:

GoogleCodeExporter commented 8 years ago
It is now fixed , thanks a lot for your support.

One more question is it possible to just only crawl using skipfish (not doing 
any vulnerability test), as my requirement is that i need to know the total URL 
in a website with all the external links.

Original comment by anuragno...@gmail.com on 13 Sep 2012 at 11:49

GoogleCodeExporter commented 8 years ago
Using --no-checks , all injection tests should be disabled. Passive tests are 
still performed on the responses so you'll still get some issues reported 
(mostly low risk, low hanging fruit).

The resulting report directory will contain a pivots.txt file and you can start 
future scans with:

./skipfish [..flag..] @path/to/pivots.txt

Unfortunately, I actually noticed today that there is a small bug in this 
functionality . In the test case you created for this bug description, the 
final pivot.txt file will have links1.php=1 and not links1.php?id=2 (because in 
memory, this is one "pivot" with two values for the parameter id). However, it 
will contain the asdf.php and abc.php (or whatever they were called ;p). 

I'll look into the above mentioned bug.

Hope this helps and again thanks for reporting!! (and your patience)
Niels

Original comment by niels.he...@gmail.com on 13 Sep 2012 at 8:01

GoogleCodeExporter commented 8 years ago
I think it is perfect , links1.php?id=2 is not a new link, hence not a new 
pivot , it is now able to find the new links attached with different parameters 
so working perfectly. As per my knowledge it is not a bug.

thanks.

Original comment by anuragno...@gmail.com on 14 Sep 2012 at 5:26

GoogleCodeExporter commented 8 years ago
This is also fixed in 2.10b which is now in SVN. If you have time, could you 
please test this ?

1) svn checkout http://skipfish.googlecode.com/svn/trunk/ skipfish-read-only
2) cd skipfish-read-only ; make; ./skipfish [....]

Cheers!
Niels

Original comment by niels.he...@gmail.com on 23 Nov 2012 at 8:51