Open MaelC opened 11 years ago
'&start=' is hardcoded when performing a search, so all you have to do is modify this line: https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and remove '&start=' or only append it if 'start' > 1
Awesome thanks for your help, I'll update this function as soon as possible
On Wed, Jan 16, 2013 at 2:09 PM, Kedar notifications@github.com wrote:
'&start=' is hardcoded when performing a search, so all you have to do is modify this line: https://github.com/k3170makan/GooDork/blob/master/netlib.py#L58 and remove '&start=' or only append it if 'start' > 1
— Reply to this email directly or view it on GitHubhttps://github.com/k3170makan/GooDork/issues/5#issuecomment-12315754.
<Keith k3170makan http://about.me/k3170makan Makan/>
Sorry, I was pretty tired when I wrote this last night.
You can actually start at '0' and Google should use that as page one. So rather than remove it, just start it at zero and modify to be start-1
Was working with this tool to check it out and noticed an odd behavior with small results
If you use Google and submit something that has a small number of results (less than 10) like this: site:nasa.gov intitle:"NASA - Kennedy Space Center 2012" https://www.google.com/search?n%20um=500&q=site%3Auta.edu%20intitle%3A%22Home%20-%20College%20%20of%20Business%22&start=1#hl=en&tbo=d&sclient=psy-ab&q=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&oq=site:nasa.gov+intitle%3A%22NASA+-+Kennedy+Space+Center+2012%22&gs_l=serp.3...4920.6151.7.6351.5.5.0.0.0.3.210.767.0j2j2.4.0.les%3B..0.0...1c.1.45C4MqZAjKY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.41018144,d.b2I&fp=dfe86ec64229ab6&biw=944&bih=951
and compare it to the string gooDork submitting the same command using URL Encoding:
GooDork.py site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22
Gives you this URL:
https://www.google.com/search?num=500&q=site%3Anasa.gov%20intitle%3A%22NASA%20-%20Kennedy%20Space%20Center%202012%22&start=1
Which is missing the first result. I'm pretty sure that's because &start=1 goes to the second page of a Google result and thus dropping results. I'm really curious if that means the first pages of results is consistently being dropped (I guess the test case is to run a search that returns between 11 and 20 results?). I'm still mucking around in your code, so I figured it would be best to put this here.
-Mael