maschinenmensch / edifice

A database of the built environment in Chicago
5 stars 1 forks source link

Scraper candidate: City of Chicago Landmarks database #8

Open derekeder opened 11 years ago

derekeder commented 11 years ago

The City of Chicago has an online tool for looking up historical landmarks. These should be pretty easy to scrape.

A couple hundred historical landmarks with descriptions and images: http://webapps.cityofchicago.org/landmarksweb/web/listings.htm

A database of 17,000 Chicago buildings including address, architect, type, color code, major tenant (probably outdated), and PIN.

Selecting a blank value for Architect will return the whole list (I think) http://webapps.cityofchicago.org/landmarksweb/search/home.htm

This may have already been released as this dataset: https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6 https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi

jpvelez commented 11 years ago

The datasets are different for some reason. We should figure out who the researchers are at DHED that know about these.

Juan-Pablo Velez 312-218-5448

On Friday, January 11, 2013 at 4:43 PM, Derek Eder wrote:

The City of Chicago has an online tool for looking up historical landmarks. These should be pretty easy to scrape. A couple hundred historical landmarks with descriptions and images: http://webapps.cityofchicago.org/landmarksweb/web/listings.htm
A database of 17,000 Chicago buildings including address, architect, type, color code, major tenant (probably outdated), and PIN. Selecting a blank value for Architect will return the whole list (I think) http://webapps.cityofchicago.org/landmarksweb/search/home.htm
This may have already been released as this dataset: https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6 https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi

— Reply to this email directly or view it on GitHub (https://github.com/maschinenmensch/edifice/issues/8).

danxoneil commented 11 years ago

Yes, these are different datasets.

The "historical landmarks" first reffed above are the same items in the dataset at the bottom of above: https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6 https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi

These are "a list of individual Chicago Landmarks designated by City Council upon recommendation of the Commission on Chicago Landmarks". In other words, they went through a formal process for designation and made the final cut.

The data from this process is the scrapable PDFs of monthly meeting minutes published by the Commission, five years of which are published here. This is a good candidate for scraping-- well-formed addresses with large blocks of descriptive narrative associated with each address. Very rich information that can inform decision-making in the future. I will add that in a separate issue.

Though there may be internal documents of the Commission that are more structured than these meeting meeting minutes, it's not likely that the City would ever go back and attempt to turn these PDFs into publishable datasets on the data portal. There are far more worthwhile dataset candidates than this one.

However, turning these meeting minutes into structured data might be a good project for a non-programmer to get involved in edifice. A tool like http://tabula.nerdpower.org/ wouldn't really work, because it's not tabular to begin with.

At EveryBlock, we had a custom tool for doing this (pull in text, highlight proposed addresses and blocks of text associated with it, allow a human to confirm/ fix, and publish). See screenshot. It seems like it would be a good thing to do that in this project. Anyone want to make that?

7916353320_501ac54f0b_b

The middle item reffed above are all items from the "inventory of architecturally and historically significant structures". This is a completely separate dataset, and super-useful to this project. Added that as #26 (could someone with access please add the "scraper" label to that issue?)