ropensci / ozunconf18

repository for the rOpenSci ozunconference 2018
31 stars 7 forks source link

Australian Data redux #6

Open Lingtax opened 6 years ago

Lingtax commented 6 years ago

At the 2017 unconference, some of us worked on making Australian data more accessible. This led to ozflights and ozroaddeaths.

Maybe we can do more of this this year? What data are you interested in / do you think might be useful?

I found this data on Liquor and Gambling in Victoria maybe we could find things like this nationally and stitch it together?

njtierney commented 6 years ago

Absolutely! This sounds great! #4 discusses an idea for an Australian babynames pkg (or rather, regular names).

Some ideas:

peggynewman commented 6 years ago

There's been work on building this Australian national list of open data sources: Knowledge Network It doesn't look like it has an API as such but at least you can use the search tool to check out multiple providers and datasets in one go.

njtierney commented 6 years ago

Ah, awesome! Thanks @peggynewman !

Lingtax commented 6 years ago

Had a bit of a test and it seems data.gov.au covers some, but not all of data.vic.gov.au datasets.

danwwilson commented 6 years ago

It would be good to develop a package that allows people to access the ABS data through their SDMX API. http://www.abs.gov.au/websitedbs/D3310114.nsf/home/absstat has more details. There is the rsdmx package that assists with this, but you need to know what data is available and how to query it.

coolbutuseless commented 6 years ago

Is there enough meta-information from data.gov.au site to auto-generate a data package from nominated datasets?

i.e User nominates some datasets, get R to

Generating the 'ozdeaths' package could be almost a one-liner!

Extra tools for

jesse-jesse commented 6 years ago

I think this is a great idea. I was thinking about conducting an audit of the available open data. To report on how open it is practically. I recently requested some data from QLD and they provided it in a PDF. Supposedly that is open data

Lingtax commented 6 years ago

I think @djnavarro agrees with you

jesse-jesse commented 6 years ago

I started looking at the QLD gambling data a while ago. But we didn't get too far. https://github.com/RedNigel/Queensland-gaming-machines .

Could combine with the vic gambling data.

jesse-jesse commented 6 years ago

I'd also be interested in coming up with an Accessibility score, taking a random sample of datasets from the data.gov.au websites, scoring the selected datasets and then writing a report back to Australian gov. Or maybe just do this for QLD and then report this back to the Digital Innovation Team.

The fact that the https://data.qld.gov.au/ website has a section for "developers" and not for "data scientists" . Makes me feel like QLD is missing the point a little and needs some guidance.

jesse-jesse commented 6 years ago

There are 131 pdf datasets on the QLD open data portal.. !!!!

Lingtax commented 6 years ago

The accessibility score idea is gold, @jesse-jesse

Are there standard metrics we could use, or would we need to derive these? If so, I'd be keen to lock down our criteria early and register them in some timestamped way to protect against arguments of cherrypicking/target shifting.

jesse-jesse commented 6 years ago

I haven't looked for any metrics yet. I am sure we could find some. Good idea to lock them down. I think they should be able to be re-evaluated, but the re-evaluation should be transparent. the unconf could also be a good place to vet the accessibility score as well. we could do a first draft in the morning and then review it at lunch or morning tea and get the input of others.

Lingtax commented 6 years ago

Ohhh... I got one : https://www.ands-nectar-rds.org.au/fair-tool Worth considering as a starting point, as it links to some actionable goals and there are efforts to promote these criteria; https://ardc.edu.au/planning/events/top-10-fair-data-things-global-sprint

mdsumner commented 6 years ago

Seems like https://nationalmap.gov.au/about.html by terria.io is the latest and greatest central source, it's mostly new to me