open-austin / project-ideas

:bulb: A place to collect ideas for Open Austin projects
183 stars 25 forks source link

a script for taking Austin's data portal analysis to the next level (short term solution) #26

Open haileypate opened 8 years ago

haileypate commented 8 years ago

Concise description: At the City of Austin, we are able to create a data table that gives us summary information about all the public datasets on our open data portal. (We hope to automate the production of this... stay tuned!)

What we don't currently have is a data table that tells us about the rows in each dataset. The MVP for this project would take one json file full of data collected from Socrata's views API, filter it down, and transform it into a CSV that could be published back to Socrata by City of Austin staff.

Why (more details/brain dump/alpha) This would help us explore questions such as:

Who will use/benefit from this project? City staff for sure... the information would inform the development of data improvement campaigns and data quality surveillance. Hailey pledges to publish data produced by this script on the City's open data portal. Ideally, future iterations would automate that publishing.

Project Needs (dev/design/resources) Python solution preferred, so as to complement capabilities of current City staff. But we're open to learning how to make this work with other languages.

The source data file (4MB) can be accessed here: https://drive.google.com/file/d/0ByH6oExoXJ0KTDh1UUU4RnlpMWs/view?usp=sharing;

The target schema is described here: https://docs.google.com/spreadsheets/d/1v1PmiiYhznrPWD2V7YDvyCiob-JXEhUxLZFk4B5gS68/edit?usp=sharing

Status (in progress, pie-in-the-sky) just an idea. hailey keeps trying to build as time permits, but continuously fails.

luqmaan commented 8 years ago

This project sounds interesting.

What are some of the existing Socrata analysis projects?

haileypate commented 8 years ago

@luqmaan Do you mean like... what are we studying at the City? Or what analytics are offered by Socrata in general? Or...?

There's a lot of stuff in my head... needs to come out! Perhaps I'm not the greatest researcher, but I haven't come across a ton of studies that study the "state" of a City's open data with "in the weeds" objectivity. Lots of subjective stuff still out there. Would love to work with folks, I could use some help getting things out of my head and into documentation.

spatialaustin commented 8 years ago

this is a great idea, and i would like to do it.

we should be able to schedule a process that reads in all the dataset metadata and updates a metadata dataset on the data portal.

@haileypate how are you compiling that dataset metadata json? i figured out that this will get you the metadata for a single dataset: https://data.austintexas.gov/api/views/xxxx-xxxx/rows.json

how do you view all datasets at once?

@luqmaan all that I'm aware of is Hailey's running list, Chip's inventory page (http://data.open-austin.org/data-catalogs/views/by-department.html), and the progress report.

mtb33 commented 8 years ago

I'd be happy to help with this; would anyone like set up a group for it at the Civic Hack Night?

luqmaan commented 8 years ago

@mtb33 I've added it to the hackpad. https://openaustin.hackpad.com/3-November-2015-civichacknight-e2Cad5mmOrS

mateoclarke commented 8 years ago

Worth noting here. There was a conversation about data portal analytics related to our Open Data Progress Report project.

https://github.com/open-austin/open-data-progress-report/issues/10

http://data.open-austin.org/data-catalogs/views/by-department.html

haileypate commented 8 years ago

check out the dataset of datasets here: https://docs.google.com/spreadsheets/d/1bPd9P3NLpf-_DpwGhxus0AseuqaU2WVfqBHYF5tbD0E/edit?usp=sharing

mtb33 commented 8 years ago

Script repository: https://github.com/mtb33/data-portal-analysis

luqmaan commented 8 years ago

:+1:

Do you want to transfer the repository to open-austin? That way the URL becomes https://github.com/open-austin/data-portal-analysis.

mtb33 commented 8 years ago

I'm having a hard time using the Socrata API to get information that's in the datasets.json file @haileypate provided.

URLs like this: https://data.austintexas.gov/api/views/xxxx-xxxx/rows.json don't provide metadata. URLs like this: https://data.austintexas.gov/resource/xxxx-xxxx.json return almost the same thing, but also without metadata. I can't figure out why they're different.

Which API endpoint provides information about the datasets, such as id, name, createdAt, downloadCount, etc?

amaliebarras commented 7 years ago

@haileypate is this said dataset of datasets? or are you still looking for help with building the tool to create these CSVs?

werdnanoslen commented 7 years ago

Messaged @haileypate on Slack

haileypate commented 7 years ago

@mtb33 worked with me last year to create CSVs that describe each column of every dataset on the portal. i just used the scripts last week, they worked great! the project is here: https://github.com/open-austin/data-portal-analysis

werdnanoslen commented 7 years ago

awesome @haileypate, so is this project, like done? Is it something that we can feature on the website as a completed project? If so, are there any important links other than the github repo?

werdnanoslen commented 6 years ago

messaged @haileypate on Slack. Hopefully we can feature this on the site if it's all done :)