mages / ChainLadder

Claims reserving models in R
https://mages.github.io/ChainLadder/
77 stars 63 forks source link

Theoretical Understanding of How Methods Should Handle 0's #87

Closed alex-pax closed 2 years ago

alex-pax commented 2 years ago

Sorry if this is longwinded and indirect, but I feel like I've encountered behavior that is of general interest to most ChainLadder users. So I thought I would share it here in the hopes of clarifying some of that behavior with the developers.

I've noticed that many of the various ChainLadder methods (MackChainLadder, BootChainLadder, ClarkLDF, ClarkCapeCod, etc...) will completely fail if there is even a single 0 observed in the cumulative triangle regardless of how big the triangle is. Zeroes in cumulative development triangles are generally uncommon. However, they do appear in certain circumstances such as excess lines where development may not appear until after the first development period or when refining development triangles into shorter origin/development periods such as accident quarters or months.

Presumably this behavior of the methods is driven by the fact that the individual age-to-age factor from that observation to the next age is Inf. In practice, however, I think it is common to ignore these observations when parameterizing actuarial methods.

I've found a couple of workarounds that I outline below. I haven't been able to find a discussion of this online anywhere, so apologies if this conversation is already happening elsewhere. Is there any appetite for formalizing the way each method would handle observed zeroes as a default approach? I could see this becoming a pandora's box of edge cases (for instance, not all zeroes can be assumed to be the same. What if you have an entire column of zeroes? An entire row? Some other configuration that causes errors?) But it would provide a lot of value, particularly when deploying methods across multiple triangles at a time.

As an example, take this triangle:

## Inserting a 0 at the first development age for origin date 2016:
x <- data.frame(origin = c(2015, 2015, 2015, 2015,
                           2016, 2016, 2016,
                           2017, 2017,
                           2018),
                dev    = c(12, 24, 36, 48,
                           12, 24, 36,
                           12, 24,
                           12),
                value = c(20, 80, 150, 190,
                          0, 60, 80,
                          50, 90,
                          90))

x_tri <- as.triangle(x)

Calling any of the four methods mentioned above produces an error. For MackChainLadder, ClarkLDF, and ClarkCapeCod the error output is clearly related to fitting a model to Inf observation(s). For BootChainLadder it's more opaque, but presumably related to passing Inf's to the shape parameter of rgamma (you can reproduce this error message with rgamma(1, shape = Inf/Inf), for example)

MackChainLadder(x_tri)
## MackChainLadder error:
##  Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,  : 
##  NA/NaN/Inf in 'x'

BootChainLadder(x_tri)
## BootChainLadder error message:
##   Warning message:
##   In rgamma(length(simExp[!is.na(simExp)]), shape = abs(simExp[!is.na(simExp)]/scale.phi),  :
##   NAs produced

ClarkLDF(x_tri, maxage = 72)
## ClarkLDF error:
##  Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,  : 
##  NA/NaN/Inf in 'x'

ClarkCapeCod(x_tri, maxage = 72, Premium = 50)
## ClarkCapeCod error:
##  Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,  : 
##  NA/NaN/Inf in 'x'

The MackChainLadder method has an argument, weights, that can be used to ignore the zeroes. The code below produces a warning but not an error:

MackChainLadder(x_tri, weights = pmin(as.matrix(x_tri), 1))

For the BootChainLadder method, replacing zeroes with NAs seems to work in at least this example:

BootChainLadder(apply(X = x_tri,
                      ## Apply across rows and columns:
                      MARGIN = c(1,2),
                      ## Replace 0s with NA:
                      function(x) ifelse(x == 0, as.numeric(NA), x)))

Replacing zeroes with NA also seems to work for ClarkLDF and ClarkCapeCod:

ClarkLDF(apply(X = x_tri,
               ## Apply across rows and columns:
               MARGIN = c(1,2),
               ## Replace 0s with NA:
               function(x) ifelse(x == 0, as.numeric(NA), x)),
         maxage = 72)

ClarkCapeCod(apply(X = x_tri,
               ## Apply across rows and columns:
               MARGIN = c(1,2),
               ## Replace 0s with NA:
               function(x) ifelse(x == 0, as.numeric(NA), x)),
             maxage = 72,
             Premium = 50)
trinostics commented 2 years ago

The underlying issue is that an affine (y = mx) model, as is the standard chain-ladder method, is inappropriate for a process in which an actual observation of zero can be followed by a non-zero observation because the affine model says the following obs MUST be zero. As the OP notes, zeros at beginning ages occur frequently with excess-/re-insurance. By kicking out those beginning-zero observations and estimating the standard error (SE) from only the nonzero data, the practitioner ends up with an understated SE.

I have experimented with replacing actual zero’s with a non-zero epsilon – say $1 or 1 euro. SE’s blew up, as expected and hoped. Then I experimented with adjusting epsilon until the SE looked more reasonable. Eventually I realized I was replacing one type of actuarial judgment (model error) with another judgment (parameter error) that was more complicated to explain – I tossed the epsilon approach.

The ability of R to use NA to indicate non-available data is brilliant. That’s why I made sure NA’s trigger non-observations when I wrote the Clark methods.

In the long run, I am a proponent of using linear models (y = mx + b) that incorporate an intercept for just this situation. Since beginning zero’s tend to occur only at immature ages, those may be considered too-much-work-for-the-effort “edge cases.” None of the models the OP mentions can incorporate an intercept in their current implementation.

By the way, the Mack method can work accurately when the selected average is the all-year-weighted average. However, ChainLadder must be modified to avoid the use of “weights” in that case.

In summary,

I. When the beginning age contains missing data, use NA instead of zero. The algorithms should be modified to eliminate such non-observations from the analyzed dataset.

II. When the beginning age contains an actual zero, I suggest

  1. Modify the Mack algorithm to work correctly when the selection is the all-year weighted average.
  2. a. Workaround: Replace the zeros with NA, use NA’s to “kick them out” artificially, and wave your hands a lot when bumping up the SE to reflect model error. b. Long term: Enhance the models to utilize an intercept.

Dan Murphy

From: Alex Pax @.> Sent: Monday, July 18, 2022 1:35 PM To: mages/ChainLadder @.> Cc: Subscribed @.***> Subject: [mages/ChainLadder] Theoretical Understanding of How Methods Should Handle 0's (Issue #87)

Sorry if this is longwinded and indirect, but I feel like I've encountered behavior that is of general interest to most ChainLadder users. So I thought I would share it here in the hopes of clarifying some of that behavior with the developers.

I've noticed that many of the various ChainLadder methods (MackChainLadder, BootChainLadder, ClarkLDF, ClarkCapeCod, etc...) will completely fail if there is even a single 0 observed in the cumulative triangle regardless of how big the triangle is. Zeroes in cumulative development triangles are generally uncommon. However, they do appear in certain circumstances such as excess lines where development may not appear until after the first development period or when refining development triangles into shorter origin/development periods such as accident quarters or months.

Presumably this behavior of the methods is driven by the fact that the individual age-to-age factor from that observation to the next age is Inf. In practice, however, I think it is common to ignore these observations when parameterizing actuarial methods.

I've found a couple of workarounds that I outline below. I haven't been able to find a discussion of this online anywhere, so apologies if this conversation is already happening elsewhere. Is there any appetite for formalizing the way each method would handle observed zeroes as a default approach? I could see this becoming a pandora's box of edge cases (for instance, not all zeroes can be assumed to be the same. What if you have an entire column of zeroes? An entire row? Some other configuration that causes errors?) But it would provide a lot of value, particularly when deploying methods across multiple triangles at a time.

As an example, take this triangle:

Inserting a 0 at the first development age for origin date 2016:

x <- data.frame(origin = c(2015, 2015, 2015, 2015,

                       2016, 2016, 2016,

                       2017, 2017,

                       2018),

            dev    = c(12, 24, 36, 48,

                       12, 24, 36,

                       12, 24,

                       12),

            value = c(20, 80, 150, 190,

                      0, 60, 80,

                      50, 90,

                      90))

x_tri <- as.triangle(x)

Calling any of the four methods mentioned above produces an error. For MackChainLadder, ClarkLDF, and ClarkCapeCod the error output is clearly related to fitting a model to Inf observation(s). For BootChainLadder it's more opaque, but presumably related to passing Inf's to the shape parameter of rgamma (you can reproduce this error message with rgamma(1, shape = Inf/Inf), for example)

MackChainLadder(x_tri)

MackChainLadder error:

Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :

NA/NaN/Inf in 'x'

BootChainLadder(x_tri)

BootChainLadder error message:

Warning message:

In rgamma(length(simExp[!is.na(simExp)]), shape = abs(simExp[!is.na(simExp)]/scale.phi), :

NAs produced

ClarkLDF(x_tri, maxage = 72)

ClarkLDF error:

Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :

NA/NaN/Inf in 'x'

ClarkCapeCod(x_tri, maxage = 72, Premium = 50)

ClarkCapeCod error:

Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :

NA/NaN/Inf in 'x'

The MackChainLadder method has an argument, weights, that can be used to ignore the zeroes. The code below produces a warning but not an error:

MackChainLadder(x_tri, weights = pmin(as.matrix(x_tri), 1))

For the BootChainLadder method, replacing zeroes with NAs seems to work in at least this example:

BootChainLadder(apply(X = x_tri,

                  ## Apply across rows and columns:

                  MARGIN = c(1,2),

                  ## Replace 0s with NA:

                  function(x) ifelse(x == 0, as.numeric(NA), x)))

Replacing zeroes with NA also seems to work for ClarkLDF and ClarkCapeCod:

ClarkLDF(apply(X = x_tri,

           ## Apply across rows and columns:

           MARGIN = c(1,2),

           ## Replace 0s with NA:

           function(x) ifelse(x == 0, as.numeric(NA), x)),

     maxage = 72)

ClarkCapeCod(apply(X = x_tri,

           ## Apply across rows and columns:

           MARGIN = c(1,2),

           ## Replace 0s with NA:

           function(x) ifelse(x == 0, as.numeric(NA), x)),

         maxage = 72,

         Premium = 50)

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mages_ChainLadder_issues_87&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=caFEVqKRHeAW1oVIALtitxAh3nOgMduKIBAya2WvPpw&s=G-AcVIlW9C8DjRijOQ9lx69qOL9WtRsX-DB_NhKzzLQ&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABRJYBZHZDPRRFKPPGK2H4DVUW5WJANCNFSM535RYPUQ&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=caFEVqKRHeAW1oVIALtitxAh3nOgMduKIBAya2WvPpw&s=qX6tv1IhadENe_sOZb3pw40wdjzHPK_JRdXCZ-UTc8g&e=. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

alex-pax commented 2 years ago

@trinostics Thanks for your thoughtful response. That all makes sense to me. Please feel free to close this issue.

Seems like the Brosius paper (https://www.casact.org/sites/default/files/database/studynotes_brosius6.pdf) would be a good method to implement for that y = mx + b framework you mentioned. When I finally finish taking exams, I'd love to be able to contribute to this package, and seems like that could be a good place to start.

alex-pax commented 2 years ago

Closing per my comment above. Thanks for your response on this.

trinostics commented 2 years ago

Alex

Thank you for the 1993 Brosius reference. Do you know how it was published by the CAS? Proceedings? Forum? I could not find it at https://www.casact.org/publications-research/library

Curious that it was written the year before mine “Unbiased Loss Development Factors” was published in the Proceedings. His Least Squares Method is Model I in my paper.

Two observations:

  1. Looking at development triangles of loss ratios rather than losses is a little used approach that should reduce the uncertainty of the (a,b) parameter estimates. If the constant is omitted (as with the chain-ladder method) I’m not sure it adds anything to the ultimate estimates – means and standard errors – when the Mack (or Murphy) method is used.
  2. In his final example, I am intrigued how he uses the estimates from the 1988 year in his projection of the 1989 year. I had always wondered why we bother developing, e.g., AY 1989 to 48 months and then again to 60 months. It’s not like we’re getting more observations since there are only 3 48-60 mo. observations available anyway. It’s the 36-60 month change we’re actually interested in for AY 1989 and we have three previous observations of that. By using the 1988 estimate, he manufactures a fourth observation. I am curious how the MSE math would work out in his implementation of the “chain.”

Best of luck with your exams. I hope your interest in ChainLadder continues. Please stay in touch.

Dan

From: Alex Pax @.> Sent: Thursday, July 21, 2022 1:50 PM To: mages/ChainLadder @.> Cc: dmurphy trinostics.com @.>; Mention @.> Subject: Re: [mages/ChainLadder] Theoretical Understanding of How Methods Should Handle 0's (Issue #87)

@trinosticshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_trinostics&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=mfqB6BLiioOXMgFgFggq7EK7geZTSLtMwKHB-C_OtxA&s=ZrGnN6C4rZeboqrRQB3H0Gc4EkubS7XP_vW7A55cc6Q&e= Thanks for your thoughtful response. That all makes sense to me. Please feel free to close this issue.

Seems like the Brosius paper (https://www.casact.org/sites/default/files/database/studynotes_brosius6.pdfhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.casact.org_sites_default_files_database_studynotes-5Fbrosius6.pdf&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=mfqB6BLiioOXMgFgFggq7EK7geZTSLtMwKHB-C_OtxA&s=TSxgpv5PxZe01JUFaqkUauWifkuyyPle6Gu0KED5A50&e=) would be a good method to implement for that y = mx + b framework you mentioned. When I finally finish taking exams, I'd love to be able to contribute to this package, and seems like that could be a good place to start.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mages_ChainLadder_issues_87-23issuecomment-2D1191921457&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=mfqB6BLiioOXMgFgFggq7EK7geZTSLtMwKHB-C_OtxA&s=9vTocvD-tZ-71VpKHYg6CtKEocRS72xgmoYxs9zSthc&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABRJYBY5WLUCUUSJKZ2A7ALVVGZY3ANCNFSM535RYPUQ&d=DwMCaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=Vud7r063yomY90ooqA-RtFTFd_VQMt-eZA0h4MxPjNo&m=mfqB6BLiioOXMgFgFggq7EK7geZTSLtMwKHB-C_OtxA&s=__z2f7zzQsY6--vImvEitt8pqGIS0OyYyJJ_PtDNlZA&e=. You are receiving this because you were mentioned.Message ID: @.**@.>>

alex-pax commented 2 years ago

The Brosius paper is the first item on the current Exam 7 text reference here: https://www.casact.org/exam/exam-7-estim-liabilities-valuation-erm. I think for many people this exam is the source, but someone who's involved with the exam committee might have better insight.