Closed HughParsonage closed 4 years ago
At the moment, the australia
dataset has each person's home SA2, work DZN and one-digit occupation.
We can generate occ x industry x DZN distribution and probabilistically assign each person an industry based on their occ x DZN. Could do this ignoring occ, but would get accountants working at bars, etc. Not a huge issue.
Use industry x DZN as a 'workplace' definition and assign each worker a workplace ID wid
.
Note that there can be thousands of workers in each DZN, so might need to cap the number of people one interacts with if using attack rate.
We can then use occupation to determine whether someone can/will work from home. Can use industry to input unemployment → reducing the number of people going to work in a particular wid
.
Will's suggestion seems sensible to me. I'm not sure how you'll determine ability to work from home though?
I'm not across what you're doing here sufficiently to really weigh in, sorry
@HughParsonage what do you think? Let me know and I will start putting data together. Also lmk re BLADE microdata if you need it.
Re BLADE: would be great to get Headcount by ANZSIC by location. By occupation if that's in there. At this stage I think we'll probably just use probabilistic estimates of number of colleagues.
Obviously for something like "Woolworths" it could appear to be 100,000 so I think we'll need a fair bit of flexibility after the empirical data is used
@HughParsonage Tablebuilder doesn't have location other than state. Doesn't have occupation either (not really possible for business).
Headcount of employees comes in ranges:
1-9
10-19
20-99
100-199
200+
Missing and 0
Data exported to exdata/businesses.fst
. Looks like this:
# A tibble: 7,521 x 4
state anzsic employees n
<chr> <chr> <chr> <dbl>
1 NSW Nursery and Floriculture Production 1 to 9 employees 176
2 NSW Nursery and Floriculture Production 10 to 19 employees 45
3 NSW Nursery and Floriculture Production 20 to 99 employees 28
4 NSW Nursery and Floriculture Production Missing and 0 responses 549
5 NSW Mushroom and Vegetable Growing 1 to 9 employees 266
6 NSW Mushroom and Vegetable Growing 10 to 19 employees 42
7 NSW Mushroom and Vegetable Growing 20 to 99 employees 40
8 NSW Mushroom and Vegetable Growing 100 to 199 employees 3
9 NSW Mushroom and Vegetable Growing Missing and 0 responses 1332
10 NSW Fruit and Tree Nut Growing 1 to 9 employees 662
# … with 7,511 more rows
There are 3 million businesses.
Note that microdata access in DataLab would be better, but would take too long for our project.
Does this mean a person's DZN will be ignored?
Thanks! I think that employee split matches intuition about the extent of interactions.
That 3M business number seems low to me. I would have thought ~5M with salaried employees. (Not a critique, just a note about the extent of errors due to definitions.)
Currently, our
extdata
has each person's destination zone of work (DZN) but not much else. We have occuption-industry counts, but there's not currently a link to the persons.We could:
Currently I'm thinking the full-time/part-time data would be used solely to determine whether someone goes to work for the day.
Any other ideas? Any critiques?