openoakland / DataDay

Projects for Oakland Data Day 2014
3 stars 3 forks source link

Web Scraping 101 Workshop, scrape some Oakland Data! #11

Open spjika opened 10 years ago

spjika commented 10 years ago

Facilitator: @maxogden

I'm going to host a workshop on Saturday to teach web scraping 101. In preparation I spent the last hour or so going through 30 pages of google search results for "site:oaklandnet.com search" (all pages on oaklandnet.com and contain the word 'search') and produced the following spreadsheet:

https://docs.google.com/spreadsheets/d/1KS1UPpmMWA0v5BmBUzZPFn--UBvlSOaC1IzhDA2upLw/edit#gid=0

Good candidates are pages with search forms that likely have a database behind them. Less ideal are PDFs or mapping/GIS websites that only display data on maps and not in html/tables (these are sometimes harder to scrape).

I've prepared a tutorial here: https://github.com/maxogden/web-scraping and will be leading attendees through the tutorial + helping them learn to write scrapers.

The goal is to create CSVs for all of these datasets. We can run the scrapers often so that they grab updated data. I've found this to be a good first step that takes only a little effort in the long term process of properly releasing an open dataset. Scrape first, ask questions later!

I haven't yet checked any of these to see if they exist on Socrata/CKAN, so if you know please update the "Dataset already open?" column.