grattan / R_at_Grattan

Using R at Grattan
https://grattan.github.io/R_at_Grattan/
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Develop main 'data project' #6

Closed wfmackey closed 2 years ago

wfmackey commented 5 years ago

Develop a main 'data project' that the documentation will follow.

The data needs to allow:

I also think it should be publicly accessible (so no ABS microdata) and reasonably large (large enough to make it 'worth it').

MattCowgill commented 5 years ago

Table 16 here has labour force stats (unemp rate, etc. etc.) by SA4 for each month of the past 21 years; that could be useful: https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/6291.0.55.001Jun%202019?OpenDocument

Works with readabs too. I think it might be 2011 ASGS but not sure.

wfmackey commented 5 years ago

ABS puts out a bunch of separate Excel files by SA2-SA4 in their 'Data by Region' publication: https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1410.02013-18?OpenDocument

It looks gross, but pretty detailed. Runs from 2011-2018.

wfmackey commented 5 years ago

Works with readabs too. I think it might be 2011 ASGS but not sure.

Coool. Maybe the project could be building profiles of SA4s in Australia over time: pop, income, LFS, etc.

MattCowgill commented 5 years ago

Yep although I think it would be best if we had some concise research question to answer, rather than just building profiles. Something like "did areas with high unemployment swing against the government" or whatever (not that, but something like that)

wfmackey commented 5 years ago

Yes! That's important.

So let's use ABS data on an SA4 level, that can be joined with LFS via readabs, and joined with polling booth data.

We could explore how areas with high unemployment (or who are in Sahm-recessions) vote

wfmackey commented 5 years ago

This structure also allows a sub-subsection on how-to-best download data from TableBuilder.

jonathananolan commented 5 years ago

I think we should come up with a project that has layers which allow the introduction of more complex analysis, but with a great pay off at the end of each relatively short section. Cleaning a non tidy dataset should not be first because it's boring and conceptually difficult.

VISTA is a great tidy dataset - but we could use another one as long as it has geographical and temporal elements.

So the start could be:

S1: What's the mode share in Melbourne?

s2 - Is the mode share different for women and men?

s3 - Is the mode share changing over time?

s4 Is there a relationship between unemployment and mode share?

And on from there...

MattCowgill commented 5 years ago

I like this idea, though I'm not sure how far down the path of using a different data set/ project @wfmackey is.

One thing I don't like about your structure @jonathananolan is "getting your project ready for QC" sitting where it is. We really want to stress the importance of project setup, folder structure, etc. so this will come before the data work.