openplans / openblock

OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news
61 stars 26 forks source link

Ebdata scrapers should be runnable as scripts, and provide a convenient way to load their schemas #234

Open slinkp opened 11 years ago

slinkp commented 11 years ago

It would be much easier to document how to run, and load schemas for, the scripts in ebdata/scrapers if I could tell users to just do something like this hypothetical terminal session:

$ flickr_retrieval --help
Usage: flickr_retrieval [options] [commands]

Options:
  -h, --help       show this help message and exit
  --schema=SCHEMA  Slug of schema to use when retrieving. Default is 'photos'.
  -f, --force      With the load-schema command, create the schema even if it already exists.

Commands:
  run              Retrieve photos.
  load-schema      Create the 'photos' schema. Will exit if it already exists,
unless you also specify `--force`. 

$ flickr_retrieval load-schema
Loading /home/pw/builds/openblock/builds/20110519/src/openblock/ebdata/ebdata/scrapers/general/flickr/photos_schema.json
Installed 5 object(s) from 1 fixture(s)

$ flickr_retrieval run
INFO list_detail: update() in <class '__main__.FlickrScraper'> started
INFO newsitem_list_detail: Created NewsItem photos: 10084 (total created in this scrape: 1)
INFO newsitem_list_detail: Created NewsItem photos: 10085 (total created in this scrape: 2)
...

If all our scrapers followed that command-line API, it would be pretty nice.

As it is, we have to document how to find where ebdata is installed (which differs depending on how you installed it); find the relevant python script; run it with the right python (i.e. have your virtualenv activated); oh and make sure you've done django-admin.py loaddata path/to/whereever/the/schema/lives. And the script and schema fixture don't have 100% consistent naming conventions.

THat is a lot of things that can be got wrong and confuse someone who isn't experienced with python packaging and so forth.

This would be straightforward to fix, but I don't have time at the moment.

slinkp commented 11 years ago

Ticket imported from Trac: http://developer.openblockproject.org/ticket/241 Reported by: slinkp