scrapinghub / shubc

Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloud
13 stars 3 forks source link

Scrapinghub API Go library

Go bindings for Scrapinghub HTTP API and a command line tool.

Installation

Requirements

Steps

$ go get [-u] github.com/vaughan0/go-ini   # install go-ini dep
$ go get [-u] github.com/scrapinghub/shubc # install or update shubc library
$ go install github.com/scrapinghub/shubc  # install the tool

Note: Use option -u to update the package and re-install.

scrapinghub.go: the library

Documentation for the library is online at godoc.org:

shubc: a command line tool

Also it's bundled a handy command line to query the API.

Getting help

% shubc help
shubc [options] <command> arg1 .. argN

 Options: 
  -apikey="<API KEY>": Scrapinghub api key
  -apiurl="https://dash.scrapinghub.com/api": Scrapinghub API URL (can be changed to another uri for testing).
  -count=0: Count for those commands that need a count limit
  -csv=false: If given, for command items, they will retrieve as CSV writing to os.Stdout
  -fields="": When -csv given, list of comma separated fields to include in the CSV
  -include_headers=false: When -csv given, include the headers of the CSV in the output
  -jl=false: If given, for command items and jobs will retrieve all the data writing to os.Stdout as JsonLines format
  -o="": Write output to a file instead of Stdout
  -offset=0: Number of results to skip from the beginning
  -tail=false: The same that `tail -f` for command `log`

 Commands: 
   Spiders API: 
     spiders <project_id>                       - list the spiders on project_id
   Jobs API: 
     schedule <project_id> <spider_name> [args] - schedule the spider <spider_name> with [args] in project <project_id>
     reschedule <job_id>                        - re-schedule the job `job_id` with the same arguments and tags
     jobs <project_id> [filters]                - list the last 100 jobs on project_id
     jobinfo <job_id>                           - print information about the job with <job_id>
...
...

Configure your APIKEY

You can configure your APIKEY using the .scrapy.cfg file in your home. You can get more information on how to configure it here: http://doc.scrapinghub.com/scrapy-cloud.html#deploying-your-scrapy-spider

Options

Commands

Spiders API

Jobs API

Items API

Log API

Autoscraping API

Eggs API

Deploy