mages / ChainLadder

Claims reserving models in R
https://mages.github.io/ChainLadder/
78 stars 64 forks source link

as.triangle.data.frame has column gaps #93

Open DVDVTAL opened 4 months ago

DVDVTAL commented 4 months ago

In the definition of as.triangle.data.frame, the triangle is aggregated by taking all of the unique origin and development values and aggregating these values. (Triangles.R, lines 83-85).

When dealing with long tails in smaller books, I have certain development periods do not have any new claims within them. This leads to the creation of triangles with missing columns.

This approach leads to inconsistencies with other functions within the ChainLadder package. Of particular note is the incr2cum function with na.rm = TRUE. In this context, the definition of upper is col(Triangle) <= ncol(Triangle) + 1 - row(Triangle). Here, it assumes that the column is equivalent to the development period and the number of columns is equivalent to the maximum number of development periods, however this assumption is not true in the current implementation. This leads to a gap between the boundary that the function expects and the boundary that a human would intuit by looking at the data in a spreadsheet and NAs being able to appear along the final diagonal (and then breaking other functions).

This issue also applies to the rows - periods with no claims will also lead to gaps, however the absence of certain periods are not easy to impute - especially when using months/quarters and the complete absence of claims in incident periods before the most recent one I believe would be far less common an issue and less important to address.

As a potential solution, a skeleton could be created containing all of the unique origin values, but the range of development periods inferred by the dataset and then have the aggregate data joined into it. Something like the following:

dev_range <- 1:max(Triangle[[dev]], na.rm = TRUE)
skeleton <- expand.grid(unique(triangle[[origin]]), dev_range, stringsAsFactors = FALSE)
colnames(skeleton) <- c(origin, dev)

aggTriangle <- merge(skeleton, aggTriangle[, c(origin, dev)], by = c(origin, dev), all.x = TRUE)

origin_names <- as.character(unique(aggTriangle[, origin]))
dev_names <-  as.character(dev_range)

(Unfortunately, my enterprise permissions prevent me from forking the repo to perform tests so I can't validate this.)

Such a change inherently assumes that the user is providing sequential values (1, 2, 3, etc.). This contrasts with what may potentially be a user implementation of providing month number at end of quarter (3, 6, 9, 12) where such users would receive a series of nil-development development periods with the associated warning messages. If such a change in functionality is undesired, then the documentation of triangles should be updated to specify the user requirements of the input data.frame - namely that every possible combination is represented within the data.frame.

michaelgicheru commented 3 weeks ago

I am currently facing the same issue. I have tried the fix you have suggested and it does seem to work. I have made one adjustment on the dev_range variable to make sure the assumption being made by the formula i.e. number of origin periods == number of development periods is always met as follows:

dev_range <- 1:(length(unique(triangle[[origin]])))
skeleton <- expand.grid(unique(triangle[[origin]]), dev_range, stringsAsFactors = FALSE)
colnames(skeleton) <- c(origin, dev)

aggTriangle <- merge(skeleton, aggTriangle[, c(origin, dev)], by = c(origin, dev), all.x = FALSE)

With this, the expand.grid function should take into account claims with no delay as well as make sure the triangle is half a square, taking into account developments where the latest claim for the earliest quarter is not necessarily from the latest development period.

But I share the same sentiments, if the current implementation is the intended behaviour, I think a warning is necessary, if not, this approach should yield the expected results.