kasaai / pc-pricing-tutorial

Practical Ratemaking
https://ratemake.com/
Other
34 stars 8 forks source link

Records with 0 exposure with claims #81

Open kevinykuo opened 5 years ago

kevinykuo commented 5 years ago

¯\(ツ)

investigating

kevinykuo commented 5 years ago

Looks like the exposure field is encoded as an integer in the raw data, and most records rounded down to 0.

> risks_table_mapped %>%
+   group_by(exposure) %>%
+   tally() %>%
+   mutate(prop = n / sum(n))
# A tibble: 624 x 3
   exposure       n    prop
      <dbl>   <int>   <dbl>
 1        0 1528196 0.632  
 2        1  345691 0.143  
 3        2  147766 0.0612 
 4        3   85097 0.0352 
 5        4   55440 0.0229 
 6        5   39287 0.0163 
 7        6   29592 0.0122 
 8        7   22663 0.00938
 9        8   18248 0.00755
10        9   14969 0.00620

I'm going to assume, for now, that the actual exposure numbers are uniformly distributed from one integer to the next, and add 0.5 of exposure to each record.

We may want to do more convincing analyses to support this.