grattan / covid19.model.sa2

4 stars 0 forks source link

Designing workplace interaction #30

Closed HughParsonage closed 4 years ago

HughParsonage commented 4 years ago

Currently, our extdata has each person's destination zone of work (DZN) but not much else. We have occuption-industry counts, but there's not currently a link to the persons.

We could:

Currently I'm thinking the full-time/part-time data would be used solely to determine whether someone goes to work for the day.

Any other ideas? Any critiques?

wfmackey commented 4 years ago

At the moment, the australia dataset has each person's home SA2, work DZN and one-digit occupation.

We can generate occ x industry x DZN distribution and probabilistically assign each person an industry based on their occ x DZN. Could do this ignoring occ, but would get accountants working at bars, etc. Not a huge issue.

Use industry x DZN as a 'workplace' definition and assign each worker a workplace ID wid.

Note that there can be thousands of workers in each DZN, so might need to cap the number of people one interacts with if using attack rate.

We can then use occupation to determine whether someone can/will work from home. Can use industry to input unemployment → reducing the number of people going to work in a particular wid.

MattCowgill commented 4 years ago

Will's suggestion seems sensible to me. I'm not sure how you'll determine ability to work from home though?

I'm not across what you're doing here sufficiently to really weigh in, sorry

wfmackey commented 4 years ago

@HughParsonage what do you think? Let me know and I will start putting data together. Also lmk re BLADE microdata if you need it.

HughParsonage commented 4 years ago

Re BLADE: would be great to get Headcount by ANZSIC by location. By occupation if that's in there. At this stage I think we'll probably just use probabilistic estimates of number of colleagues.

Obviously for something like "Woolworths" it could appear to be 100,000 so I think we'll need a fair bit of flexibility after the empirical data is used

wfmackey commented 4 years ago

@HughParsonage Tablebuilder doesn't have location other than state. Doesn't have occupation either (not really possible for business).

Headcount of employees comes in ranges:

1-9
10-19
20-99
100-199
200+
Missing and 0

Data exported to exdata/businesses.fst. Looks like this:

# A tibble: 7,521 x 4
   state anzsic                              employees                   n
   <chr> <chr>                               <chr>                   <dbl>
 1 NSW   Nursery and Floriculture Production 1 to 9 employees          176
 2 NSW   Nursery and Floriculture Production 10 to 19 employees         45
 3 NSW   Nursery and Floriculture Production 20 to 99 employees         28
 4 NSW   Nursery and Floriculture Production Missing and 0 responses   549
 5 NSW   Mushroom and Vegetable Growing      1 to 9 employees          266
 6 NSW   Mushroom and Vegetable Growing      10 to 19 employees         42
 7 NSW   Mushroom and Vegetable Growing      20 to 99 employees         40
 8 NSW   Mushroom and Vegetable Growing      100 to 199 employees        3
 9 NSW   Mushroom and Vegetable Growing      Missing and 0 responses  1332
10 NSW   Fruit and Tree Nut Growing          1 to 9 employees          662
# … with 7,511 more rows

There are 3 million businesses.

Note that microdata access in DataLab would be better, but would take too long for our project.

wfmackey commented 4 years ago

Does this mean a person's DZN will be ignored?

HughParsonage commented 4 years ago

Thanks! I think that employee split matches intuition about the extent of interactions.

That 3M business number seems low to me. I would have thought ~5M with salaried employees. (Not a critique, just a note about the extent of errors due to definitions.)