Open derekeder opened 11 years ago
It would be cool if we could automatically pull those PDFs, but they seem to usually be linked in free-form 'Description' text fields. It might just involve some super-simple scanning for bit.ly links, if that's the city's standard practice over hundreds of datasets.
Re: scraping that document — if only there was a schema for the data dictionary for my schema... [I think this is what Heidegger called "the hermeneutic circle"]
Unfortunately, a scan for bit.ly links would not be comprehensive. Here's a example: https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Gonorrhea-cases-for-femal/cgjw-mn43?.
This is a dataset currently included in this project.
Perhaps just follow and slurp all of the links in each metadata field, them associate them with that dataset, and sort it out by hand later?
The City does a pretty good job of documenting what each of these datasets are. Can these be pulled from the API?
Example: https://data.cityofchicago.org/Buildings/Building-Footprints/w2v3-isjw
Also, they usually link to a document that defines all the data fields and what they mean. Can we somehow either link to or read this in?
Example: https://data.cityofchicago.org/api/assets/003C600C-3A66-4605-8E7E-2477AAE95E16