rrwen / search_google

A command line tool and module for Google web and image search
MIT License
12 stars 1 forks source link
api cli command cse custom engine google image interface line search tool web


| Richard Wen | rrwen.dev@gmail.com

A command line tool and module for Google API web and image search.

.. image:: https://badge.fury.io/py/search-google.svg :target: https://badge.fury.io/py/search-google .. image:: https://travis-ci.org/rrwen/search_google.svg?branch=master :target: https://travis-ci.org/rrwen/search_google .. image:: https://coveralls.io/repos/github/rrwen/search_google/badge.svg?branch=master :target: https://coveralls.io/github/rrwen/search_google?branch=master .. image:: https://img.shields.io/github/issues/rrwen/search_google.svg :target: https://github.com/rrwen/search_google/issues .. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://raw.githubusercontent.com/rrwen/search_google/master/LICENSE .. image:: https://img.shields.io/github/stars/rrwen/search_google.svg :target: https://github.com/rrwen/search_google/stargazers .. image:: https://img.shields.io/twitter/url/https/github.com/rrwen/search_google.svg?style=social :target: https://twitter.com/intent/tweet?text=%23python%20%23dataextraction%20tool%20for%20%23googlesearch%20results%20and%20%23googleimages:%20https://github.com/rrwen/search_google


  1. Install Python <https://www.python.org/downloads/>_
  2. Install search_google <https://pypi.python.org/pypi/search-google>_ via pip


pip install search_google

For the latest developer version, see Developer Install_.


For help in the console::

search_google -h

Ensure that a CSE ID <https://support.google.com/customsearch/answer/2649143?hl=en> and a Google API developer key <https://developers.google.com/api-client-library/python/auth/api-keys> are set::

search_google -s cx="your_cse_id" search_google -s build_developerKey="your_dev_key"

Search the web for keyword "cat"::

search_google "cat" search_google "cat" --save_links=cat.txt search_google "cat" --save_downloads=downloads

Search for "cat" images::

search_google cat --searchType=image search_google "cat" --searchType=image --save_links=cat_images.txt search_google "cat" --searchType=image --save_downloads=downloads

Use as a Python module:

.. code-block:: python

Import the api module for the results class

import search_google.api

Define buildargs for cse api

buildargs = { 'serviceName': 'customsearch', 'version': 'v1', 'developerKey': 'your_api_key' }

Define cseargs for search

cseargs = { 'q': 'keyword query', 'cx': 'your_cse_id', 'num': 3 }

Create a results object

results = search_google.api.results(buildargs, cseargs)

Download the search results to a directory


For more usage details, see the Documentation <https://rrwen.github.io/search_google>_.


Report Contributions

Reports for issues and suggestions can be made using the issue submission <https://github.com/rrwen/search_google/issues>_ interface.

When possible, ensure that your submission is:

Code Contributions

Code contributions are submitted via pull requests <https://help.github.com/articles/about-pull-requests>_:

  1. Ensure that you pass the Tests_
  2. Create a new pull request <https://github.com/rrwen/search_google/pulls>_
  3. Provide an explanation of the changes

A template of the code contribution explanation is provided below:


## Purpose

The purpose can mention goals that include fixes to bugs, addition of features, and other improvements, etc.

## Description

The description is a short summary of the changes made such as improved speeds, implementation

## Changes

The changes are a list of general edits made to the files and their respective components.
* `file_path1`:
    * `function_module_etc`: changed loop to map
    * `function_module_etc`: changed variable value
* `file_path2`:
    * `function_module_etc`: changed loop to map
    * `function_module_etc`: changed variable value

## Notes

The notes provide any additional text that do not fit into the above sections.

For more information, see Developer Install and Implementation.

Developer Notes

Developer Install

Install the latest developer version with pip from github::

pip install git+https://github.com/rrwen/search_google

Install from git cloned source:

  1. Ensure git <https://git-scm.com/>_ is installed
  2. Clone into current path
  3. Install via pip


git clone https://github.com/rrwen/search_google cd search_google pip install . -I


  1. Clone into current path git clone https://github.com/rrwen/search_google
  2. Enter into folder cd search_google
  3. Ensure unittest <https://docs.python.org/2.7/library/unittest.html>_ is available
  4. Set your CSE ID <https://support.google.com/customsearch/answer/2649143?hl=en> and Google API developer key <https://developers.google.com/api-client-library/python/auth/api-keys>
  5. Run tests
  6. Reset config file to defaults
  7. Please note that this will use up 7 requests from your quota


pip install . -I python -m search_google -s cx="your_cse_id" python -m search_google -s build_developerKey="your_dev_key" python -m unittest python -m search_google -d

Documentation Maintenance

  1. Ensure sphinx <https://github.com/sphinx-doc/sphinx/>_ is installed pip install -U sphinx
  2. Update the documentation in docs/


pip install . -I sphinx-build -b html docs/source docs

Upload to github

  1. Ensure git <https://git-scm.com/>_ is installed
  2. Add all files and commit changes
  3. Push to github


git add . git commit -a -m "Generic update" git push

Upload to PyPi

  1. Ensure twine <https://pypi.python.org/pypi/twine>_ is installed pip install twine
  2. Ensure sphinx <https://github.com/sphinx-doc/sphinx/>_ is installed pip install -U sphinx
  3. Run tests and check for OK status
  4. Delete dist directory
  5. Update the version search_google/__init__.py
  6. Update the documentation in docs/
  7. Create source distribution
  8. Upload to PyPi <https://pypi.python.org/pypi>_


pip install . -I python -m search_google -s cx="your_cse_id" python -m search_google -s build_developerKey="your_dev_key" python -m unittest python -m search_google -d sphinx-build -b html docs/source docs python setup.py sdist twine upload dist/*


This command line tool uses the Google Custom Search Engine (CSE) <https://developers.google.com/api-client-library/python/apis/customsearch/v1> to perform web and image searches. It relies on googleapiclient.build <https://google.github.io/google-api-python-client/docs/epy/googleapiclient.discovery-module.html#build> and cse.list <https://developers.google.com/resources/api-libraries/documentation/customsearch/v1/python/latest/customsearch_v1.cse.html>_, where build was used to create a Google API object and cse was used to perform the searches.

The class search_google.api <https://rrwen.github.io/search_google/#module-api>_ simply passed a dictionary of arguments into build and cse to process the returned results with properties and methods. search_google.cli <https://rrwen.github.io/search_google/#module-cli>_ was then used to create a command line interface for search_google.api <https://rrwen.github.io/search_google/#module-api>_.

In order to use build and cse, a Google Developer API Key <https://developers.google.com/api-client-library/python/auth/api-keys> and a Google CSE ID <https://cse.google.com/all> needs to be created for API access (see search_google Setup <https://rrwen.github.io/search_google/#setup>). Creating these keys also required a Gmail <https://www.google.com/gmail> account for login access.


      googleapiclient.build  <-- Google API
             cse.list        <-- Google CSE
         search_google.api   <-- search results
         search_google.cli   <-- command line

A rough example is provided below thanks to the customsearch example <https://github.com/google/google-api-python-client/blob/master/samples/customsearch/main.py>_ from Google:

.. code-block:: python

from apiclient.discovery import build

Set developer key and CSE ID

dev_key = 'a_developer_key' cse_id = 'a_cse_id'

Obtain search results from Google CSE

service = build("customsearch", "v1", developerKey=dev_key) results = service.cse().list(q='cat', cx=cse_id).execute()

Manipulate search results after ...