k3170makan / GooDork

Command line go0gle dorking tool
http://goo-dork.blogspot.com/
Other
132 stars 37 forks source link

Losing results due to &start=1 #5

Open MaelC opened 11 years ago

MaelC commented 11 years ago

Was working with this tool to check it out and noticed an odd behavior with small results

If you use Google and submit something that has a small number of results (less than 10) like this: site:nasa.gov intitle:"NASA - Kennedy Space Center 2012" https://www.google.com/search?n%20um=500&q=site%3Auta.edu%20intitle%3A%22Home%20-%20College%20%20of%20Business%22&start=1#hl=en&tbo=d&sclient=psy-ab&q=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&oq=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&gs_l=serp.3...4920.6151.7.6351.5.5.0.0.0.3.210.767.0j2j2.4.0.les%3B..0.0...1c.1.45C4MqZAjKY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.41018144,d.b2I&fp=dfe86ec64229ab6&biw=944&bih=951

and compare it to the string gooDork submitting the same command using URL Encoding:

GooDork.py site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22

Gives you this URL:

https://www.google.com/search?num=500&q=site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22&start=1

Which is missing the first result. I'm pretty sure that's because &start=1 goes to the second page of a Google result and thus dropping results. I'm really curious if that means the first pages of results is consistently being dropped (I guess the test case is to run a search that returns between 11 and 20 results?). I'm still mucking around in your code, so I figured it would be best to put this here.

-Mael

0xKD commented 11 years ago

'&start=' is hardcoded when performing a search, so all you have to do is modify this line: https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and remove '&start=' or only append it if 'start' > 1

k3170makan commented 11 years ago

Awesome thanks for your help, I'll update this function as soon as possible

On Wed, Jan 16, 2013 at 2:09 PM, Kedar notifications@github.com wrote:

'&start=' is hardcoded when performing a search, so all you have to do is modify this line: https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and remove '&start=' or only append it if 'start' > 1

— Reply to this email directly or view it on GitHubhttps://github.com/k3170makan/GooDork/issues/5#issuecomment-12315754.

<Keith k3170makan http://about.me/k3170makan Makan/>

MaelC commented 11 years ago

Sorry, I was pretty tired when I wrote this last night.

You can actually start at '0' and Google should use that as page one. So rather than remove it, just start it at zero and modify to be start-1