jarun / googler

:mag: Google from the terminal
GNU General Public License v3.0
6.09k stars 528 forks source link

I only get 100 results when using --count=1000 or more #290

Closed nihelmasell closed 5 years ago

nihelmasell commented 5 years ago

I am trying to scrap all pdfs from a website. I am using "googler --count=1000 --site=nameofsite.com --noprompt pdf" I first tried with 10000 and didn't work. There are 20,000 results from google. I only get 100 results. I am on mac, installed the application with home-brew.

jarun commented 5 years ago

Sorry, this is not the intended use of googler - to automate bulk web scraping. Other than the fact that the command used is faulty we can't help you with this with the knowledge of what you are doing without violating google's ToS - https://support.google.com/webmasters/answer/66357?hl=en

dubiouscript commented 5 years ago

i wish you tube got the same media attention as Napster / bit-torrent ect have in the past because they are functionally the same ( can find music yt-dl any one =) /rant

just FTR https://en.wikipedia.org/wiki/Search_engine_scraping#Legal

The largest public known incident of a search engine being scraped happened in 2011 when Microsoft was caught scraping unknown keywords from Google for their own, rather new Bing service. ([16]) But even this incident did not result in a court case.

One possible reason might be that search engines like Google are getting almost all their data by scraping millions of public reachable websites, also without reading and accepting those terms. A legal case won by Google against Microsoft would possibly put their whole business as risk.

Google are getting almost all their data by scraping millions of public reachable websites, also without reading and accepting those terms.

@jarun while i appreciate you position i just wanted to point out the absurdity of our corporate internet culture times :tophat:

solong thanks for all the scripts

jarun commented 5 years ago

I don't have a legal team gunning for lawsuits. ;)