create tutorial - Githubissues

kat commented 10 years ago

create a step by step tutorial to make it easier to get started

equivalentideas commented 9 years ago

Seems like @emikulic has kicked this off with #796 and #798

henare commented 9 years ago

This kind of relates to #846 too.

equivalentideas commented 9 years ago

After our scraping workshop last week, we got consistent feedback that it would be really helpful to have a step by step guide on how to write a scraper—exactly what this issue is calling for.

This will be a really helpful resource to workshop participants as well as people using morph generally, so I'm gonna whip something up.

equivalentideas commented 9 years ago

Here are steps to writing a scraper, based on my own methods and what @henare demonstrated in the workshop:

Find the data you're looking for and work out if it can be scraped.
If it can be scraped, create your new scraper on morph.io. Pick the language you want to write your scraper in and add a nice description and name for the scraper so people can find it through search.
clone your scrape to your local machine using 'git clone scraper url'
make sure you have all the dependencies installed. If you're writing your scraper in Ruby, do you have Ruby installed? Do you have Bundler installed to manage all the libraries your scraper will need?
Now it's time to start writing your scraper.
open your code editor and look at the example code
define your object with the data you want to collect.
using IRB, get each piece of data for a single record (start small with just one record).
Once you've got each piece of data you need, consider adding a date scraped, so you can verify your data later.
Fill out the record you've defined in your scraper.rb file and use the 'p' (print) method to output the record you've collected when you run the scraper on the command line.
now add a loop to your scraper to get every record you need to on the page
if the records you need cover several pages, you'll need to loop through all the pages
save your data using the scraperwiki library
push your scraper to morph.io
run the scraper and check for errors.
review that data you've collected by looking at the api and downloading the csv
if your scraper needs to run each day, set that on morph.
Celebrate!

For the tutorial, I think it would be helpful to take readers through these steps writing an example scraper. The bills in NSW parliament is a nice simple example that includes pagination, so I think I'll go with that unless we come up with something more exciting.

openaustralia / morph

create tutorial #421