thibauts / node-google-search-scraper

Google search scraper with captcha solving support
MIT License
89 stars 40 forks source link


Google search scraper with captcha solving support

This module allows google search results extraction in a simple yet flexible way, and handles captcha solving transparently (through external services or your own hand-made solver).

Out of the box you can target a specific google search host, specify a language and limit search results returned. Extending these defaults with custom URL params is supported through options.

A word of warning: This code is intented for educational and research use only. Use responsibly.


$ npm install google-search-scraper


Grab first 10 results for 'nodejs'

var scraper = require('google-search-scraper');

var options = {
  query: 'nodejs',
  limit: 10
};, function(err, url, meta) {
  // This is called for each result
  if(err) throw err;

Various options combined

var scraper = require('google-search-scraper');

var options = {
  query: 'grenouille',
  host: '',
  lang: 'fr',
  age: 'd1', // last 24 hours ([hdwmy]\d? as in google URL)
  limit: 10,
  params: {} // params will be copied as-is in the search URL query string
};, function(err, url) {
  // This is called for each result
  if(err) throw err;

Extract all results on edu sites for "information theory" and solve captchas along the way

var scraper = require('google-search-scraper');
var DeathByCaptcha = require('deathbycaptcha');

var dbc = new DeathByCaptcha('username', 'password');

var options = {
  query: 'site:edu "information theory"',
  age: 'y', // less than a year,
  solver: dbc
};, function(err, url) {
  // This is called for each result
  if(err) throw err;

You can easily plug your own solver, implementing a solve method with the following signature:

var customSolver = {
  solve: function(imageData, callback) {
    // Do something with image data, like displaying it to the user
    // id is used by BDC to allow reporting solving errors and can be safely ignored here
    var id = null; 
    callback(err, id, solutionText);