Pull meta descriptions for each dataset

maschinenmensch / edifice

A database of the built environment in Chicago

5 stars 1 forks source link

Pull meta descriptions for each dataset #24

Open derekeder opened 11 years ago

derekeder commented 11 years ago

The City does a pretty good job of documenting what each of these datasets are. Can these be pulled from the API?

Example: https://data.cityofchicago.org/Buildings/Building-Footprints/w2v3-isjw

Also, they usually link to a document that defines all the data fields and what they mean. Can we somehow either link to or read this in?

Example: https://data.cityofchicago.org/api/assets/003C600C-3A66-4605-8E7E-2477AAE95E16

mccc commented 11 years ago

It would be cool if we could automatically pull those PDFs, but they seem to usually be linked in free-form 'Description' text fields. It might just involve some super-simple scanning for bit.ly links, if that's the city's standard practice over hundreds of datasets.

Re: scraping that document — if only there was a schema for the data dictionary for my schema... [I think this is what Heidegger called "the hermeneutic circle"]

danxoneil commented 11 years ago

Unfortunately, a scan for bit.ly links would not be comprehensive. Here's a example: https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Gonorrhea-cases-for-femal/cgjw-mn43?.

This is a dataset currently included in this project.

Perhaps just follow and slurp all of the links in each metadata field, them associate them with that dataset, and sort it out by hand later?

Screen Shot 2013-02-14 at 11 46 15 PM