Open kevinykuo opened 5 years ago
Looks like the exposure field is encoded as an integer in the raw data, and most records rounded down to 0.
> risks_table_mapped %>%
+ group_by(exposure) %>%
+ tally() %>%
+ mutate(prop = n / sum(n))
# A tibble: 624 x 3
exposure n prop
<dbl> <int> <dbl>
1 0 1528196 0.632
2 1 345691 0.143
3 2 147766 0.0612
4 3 85097 0.0352
5 4 55440 0.0229
6 5 39287 0.0163
7 6 29592 0.0122
8 7 22663 0.00938
9 8 18248 0.00755
10 9 14969 0.00620
I'm going to assume, for now, that the actual exposure numbers are uniformly distributed from one integer to the next, and add 0.5 of exposure to each record.
We may want to do more convincing analyses to support this.
¯\(ツ)/¯
investigating