It would be much easier to document how to run, and load schemas for, the scripts in ebdata/scrapers if I could tell users to just do something like this hypothetical terminal session:
$ flickr_retrieval --help
Usage: flickr_retrieval [options] [commands]
Options:
-h, --help show this help message and exit
--schema=SCHEMA Slug of schema to use when retrieving. Default is 'photos'.
-f, --force With the load-schema command, create the schema even if it already exists.
Commands:
run Retrieve photos.
load-schema Create the 'photos' schema. Will exit if it already exists,
unless you also specify `--force`.
$ flickr_retrieval load-schema
Loading /home/pw/builds/openblock/builds/20110519/src/openblock/ebdata/ebdata/scrapers/general/flickr/photos_schema.json
Installed 5 object(s) from 1 fixture(s)
$ flickr_retrieval run
INFO list_detail: update() in <class '__main__.FlickrScraper'> started
INFO newsitem_list_detail: Created NewsItem photos: 10084 (total created in this scrape: 1)
INFO newsitem_list_detail: Created NewsItem photos: 10085 (total created in this scrape: 2)
...
If all our scrapers followed that command-line API, it would be pretty nice.
As it is, we have to document how to find where ebdata is installed (which differs depending on how you installed it); find the relevant python script; run it with the right python (i.e. have your virtualenv activated); oh and make sure you've done django-admin.py loaddata path/to/whereever/the/schema/lives. And the script and schema fixture don't have 100% consistent naming conventions.
THat is a lot of things that can be got wrong and confuse someone who isn't experienced with python packaging and so forth.
This would be straightforward to fix, but I don't have time at the moment.
It would be much easier to document how to run, and load schemas for, the scripts in ebdata/scrapers if I could tell users to just do something like this hypothetical terminal session:
If all our scrapers followed that command-line API, it would be pretty nice.
As it is, we have to document how to find where ebdata is installed (which differs depending on how you installed it); find the relevant python script; run it with the right python (i.e. have your virtualenv activated); oh and make sure you've done django-admin.py loaddata path/to/whereever/the/schema/lives. And the script and schema fixture don't have 100% consistent naming conventions.
THat is a lot of things that can be got wrong and confuse someone who isn't experienced with python packaging and so forth.
This would be straightforward to fix, but I don't have time at the moment.