mages / ChainLadder

Claims reserving models in R
https://mages.github.io/ChainLadder/
77 stars 63 forks source link

Triangle with empty values in early development periods #69

Closed SuzieDunham closed 4 years ago

SuzieDunham commented 4 years ago

Is it possible to perform as.triangle on a triangle that isn't complete? I have older data where I'm only looking at the most recent ~25 years of development, so the upper left corner of my triangle is empty. When I use as.triangle on the data it seems to start the triangle at the first not null development period, which will then mess up recent accident years of the triangle. Is there an option to have the triangle start at the smallest (or first if data is ordered) origin value and development value? Example triangle for what I'd like to make: image

This is the triangle that as.triangle would generate from that data: image

marcopark90 commented 4 years ago

Hi Suzie,

Can you please post the output of the command dput on your triangle, please?

Example

dput(triangle)

Thanks, Marco

On Wed, Nov 13, 2019, 11:05 AM SuzieDunham notifications@github.com wrote:

Is it possible to perform as.triangle on a triangle that isn't complete? I have older data where I'm only looking at the most recent ~25 years of development, so the upper left corner of my triangle is empty. When I use as.triangle on the data it seems to start the triangle at the first not null development period, which will then mess up recent accident years of the triangle. Is there an option to have the triangle start at the smallest (or first if data is ordered) origin value and development value? Example triangle for what I'd like to make: [image: image] https://user-images.githubusercontent.com/57721993/68786037-f1ca9a00-05f3-11ea-8b6d-99245706704c.png

This is the triangle that as.triangle would generate from that data: [image: image] https://user-images.githubusercontent.com/57721993/68786382-8cc37400-05f4-11ea-8996-137471a2689e.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mages/ChainLadder/issues/69?email_source=notifications&email_token=ALJ2H3ACJLDQ7WRMAZQSMA3QTQXTZA5CNFSM4JM6OBG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HZCPQDQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ2H3GAJ53R5DHGCFJECXDQTQXTZANCNFSM4JM6OBGQ .

SuzieDunham commented 4 years ago

as.triangle(test1,

  • dev = 'Dev_yr',
  • origin = 'AY',
  • value = 'Value') Dev_yr AY 4 5 6 7 8 3 2 1 2009 4 4 5 5 5 NA NA NA 2010 4 4 5 5 5 3 NA NA 2011 4 4 5 5 5 3 2 NA 2012 4 4 5 5 NA 3 2 1 2013 4 4 5 NA NA 3 2 1 2014 4 4 NA NA NA 3 2 1 2015 4 NA NA NA NA 3 2 1 2016 NA NA NA NA NA 3 2 1 2017 NA NA NA NA NA NA 2 1 2018 NA NA NA NA NA NA NA 1 dput(triangle) structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, NA, NA, 4L, 4L, 4L, 4L, 4L, 4L, NA, NA, NA, NA, 5L, 5L, 5L, 5L, 5L, NA, NA, NA, NA, NA, 5L, 5L, 5L, 5L, NA, NA, NA, NA, NA, NA, 5L, 5L, 5L, NA, NA, NA, NA, NA, NA, NA, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Dim = c(10L, 8L), .Dimnames = list(AY = c("2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018"), Dev_yr = c("4", "5", "6", "7", "8", "3", "2", "1")), class = c("triangle", "matrix"))
marcopark90 commented 4 years ago

execute this, please

dput(test1)

SuzieDunham commented 4 years ago

dput(test1) structure(list(AY = c(2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2018L), Dev_yr = c(4L, 5L, 6L, 7L, 8L, 3L, 4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 1L), Value = c(4L, 4L, 5L, 5L, 5L, 3L, 4L, 4L, 5L, 5L, 5L, 2L, 3L, 4L, 4L, 5L, 5L, 5L, 1L, 2L, 3L, 4L, 4L, 5L, 5L, 1L, 2L, 3L, 4L, 4L, 5L, 1L, 2L, 3L, 4L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA, -46L))

marcopark90 commented 4 years ago

It looks like that the output of the function is as expected (fig,1 in your first post). Please run this on your machine and check that the output is correct.

` test1 <- structure(list(AY = c(2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2018L), Dev_yr = c(4L, 5L, 6L, 7L, 8L, 3L, 4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 1L), Value = c(4L, 4L, 5L, 5L, 5L, 3L, 4L, 4L, 5L, 5L, 5L, 2L, 3L, 4L, 4L, 5L, 5L, 5L, 1L, 2L, 3L, 4L, 4L, 5L, 5L, 1L, 2L, 3L, 4L, 4L, 5L, 1L, 2L, 3L, 4L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA, -46L))

test1

as.triangle(test1, dev = 'Dev_yr', origin = 'AY', value = 'Value')

`

Also, it looks like that your initial input (test1) is a data.frame with the following columns: 'AY', Dev_yr, Value

SuzieDunham commented 4 years ago

So when I run the code you provided this is the output: Dev_yr AY 4 5 6 7 8 3 2 1 2009 4 4 5 5 5 NA NA NA 2010 4 4 5 5 5 3 NA NA 2011 4 4 5 5 5 3 2 NA 2012 4 4 5 5 NA 3 2 1 2013 4 4 5 NA NA 3 2 1 2014 4 4 NA NA NA 3 2 1 2015 4 NA NA NA NA 3 2 1 2016 NA NA NA NA NA 3 2 1 2017 NA NA NA NA NA NA 2 1 2018 NA NA NA NA NA NA NA 1

And I would have hoped to see the Dev_yr from 1-8 and have NA in the top left of the triangle. When the triangle isn't populating correctly, it then doesn't allow me to calculate link ratios correctly.

linkratios <- c(attr(ata(triangle), "vwtd"), tail = 1.00) round(linkratios, 4) 4-5 5-6 6-7 7-8 8-3 3-2 2-1 tail 1.0000 1.2500 1.0000 1.0000 0.6000 0.6667 0.5000 1.0000

I'm new to the package - is there another way I should be looking at this?

marcopark90 commented 4 years ago

It is strange, this is what I see on mine.

` Dev_yr AY 1 2 3 4 5 6 7 8 2009 NA NA NA 4 4 5 5 5 2010 NA NA 3 4 4 5 5 5 2011 NA 2 3 4 4 5 5 5 2012 1 2 3 4 4 5 5 NA 2013 1 2 3 4 4 5 NA NA 2014 1 2 3 4 4 NA NA NA 2015 1 2 3 4 NA NA NA NA 2016 1 2 3 NA NA NA NA NA 2017 1 2 NA NA NA NA NA NA 2018 1 NA NA NA NA NA NA NA

` and, as expected, the link ratios

1-2 2-3 3-4 4-5 5-6 6-7 7-8 tail 2.000000 1.500000 1.333333 1.000000 1.250000 1.000000 1.000000 1.000000

which comes from a standard calculation of lr ignoring the NAs

SuzieDunham commented 4 years ago

Hmmm. I'm using the latest version of RStudio and the ChainLadder package, both downloaded last week. Any recommendations on what to check?

marcopark90 commented 4 years ago

I don't know because everything should be fine. You can "force" the order of the columns. Try this: Once you have your triangle, after the as.triangle function but before calculating the link ratios try this: new_triangle <- triangle[,as.character(1:8)] and then ata(new_triangle)

SuzieDunham commented 4 years ago

That worked to calculate the link ratios, thank you! Is there a similar way to force the order of the triangle columns?

marcopark90 commented 4 years ago

That's exactly what you are doing. The class triangle is just a matrix with named columns. By doing new_triangle <- triangle[,as.character(1:8)] you are basically defining a new triangle with the "correct" column order. From now on the triangle you will be working with is called new_triangle. If you need more information on how named dimensions work have a look here: https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/row%2Bcolnames

SuzieDunham commented 4 years ago

Thank you!

chiefmurph commented 4 years ago

I realize this is closed, but I have a few comments.

First, @SuzieDunham 's original data looks like it's already in matrix format, in which case it's already in a form for calculating link ratios -- no "as.triangle" step needed. I entered the data in excel, saved as csv, used read.csv to read it into R (necessarily a data.frame), turned it into a matrix, cleaned it up, and calculated age-to-age factors using the 'ata' function:

> x <- read.csv("test.csv")
> x <- as.matrix(x)
> dimnames(x) <- list(AY=x[,1], c("", 1:8))
> x <- x[,-1]
> library(ChainLadder)
> ata(x)

AY     1-2 2-3   3-4 4-5  5-6 6-7 7-8
  2009  NA  NA    NA   1 1.25   1   1
  2010  NA  NA 1.333   1 1.25   1   1
  2011  NA 1.5 1.333   1 1.25   1   1
  2012   2 1.5 1.333   1 1.25   1  NA
  2013   2 1.5 1.333   1 1.25  NA  NA
  2014   2 1.5 1.333   1   NA  NA  NA
  2015   2 1.5 1.333  NA   NA  NA  NA
  2016   2 1.5    NA  NA   NA  NA  NA
  2017   2  NA    NA  NA   NA  NA  NA
  smpl   2 1.5 1.333   1 1.25   1   1
  vwtd   2 1.5 1.333   1 1.25   1   1
> 

Second, help("as.triangle") motivates that function with the observation that a Triangle obtained from a data.base will usually be in long format. @marcopark90 's test1 data.frame is in long format, which is why as.triangle gives anything at all. The reason it is not in the expected order (it was in previous versions) is due to ChainLadder relinquishing its dependence on Hadley's now-unsupported package, reshape2, and going with an algorithm based on stats::aggregate, which is well-known for producing results in "unintuitive" order (one of Hadley's reasons for writing reshape2 in the first place):

> as.triangle(test1,
+              dev = 'Dev_yr',
+              origin = 'AY',
+              value = 'Value')
      Dev_yr
AY      4  5  6  7  8  3  2  1
  2009  4  4  5  5  5 NA NA NA
  2010  4  4  5  5  5  3 NA NA
  2011  4  4  5  5  5  3  2 NA
  2012  4  4  5  5 NA  3  2  1
  2013  4  4  5 NA NA  3  2  1
  2014  4  4 NA NA NA  3  2  1
  2015  4 NA NA NA NA  3  2  1
  2016 NA NA NA NA NA  3  2  1
  2017 NA NA NA NA NA NA  2  1
  2018 NA NA NA NA NA NA NA  1

Hadley dropped his support for reshape2 and moved on to another package, tidyr, with two solutions: spread and pivot_wider. Of the two, the older 'spread' produces "intuitive" results

> library(tidyr)
> spread(test1, "Dev_yr", "Value")
     AY  1  2  3  4  5  6  7  8
1  2009 NA NA NA  4  4  5  5  5
2  2010 NA NA  3  4  4  5  5  5
3  2011 NA  2  3  4  4  5  5  5
4  2012  1  2  3  4  4  5  5 NA
5  2013  1  2  3  4  4  5 NA NA
6  2014  1  2  3  4  4 NA NA NA
7  2015  1  2  3  4 NA NA NA NA
8  2016  1  2  3 NA NA NA NA NA
9  2017  1  2 NA NA NA NA NA NA
10 2018  1 NA NA NA NA NA NA NA

The newer 'pivot_wider' gives results that look like as.triangle:

> pivot_wider(test1, names_from = "Dev_yr", values_from = "Value")
# A tibble: 10 x 9
      AY   `4`   `5`   `6`   `7`   `8`   `3`   `2`   `1`
   <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1  2009     4     4     5     5     5    NA    NA    NA
 2  2010     4     4     5     5     5     3    NA    NA
 3  2011     4     4     5     5     5     3     2    NA
 4  2012     4     4     5     5    NA     3     2     1
 5  2013     4     4     5    NA    NA     3     2     1
 6  2014     4     4    NA    NA    NA     3     2     1
 7  2015     4    NA    NA    NA    NA     3     2     1
 8  2016    NA    NA    NA    NA    NA     3     2     1
 9  2017    NA    NA    NA    NA    NA    NA     2     1
10  2018    NA    NA    NA    NA    NA    NA    NA     1

So, at this point in time it seems wise to avoid re-dependence on Hadley code. @marcopark90 's solution of rearranging the columns of as.triangle's matrix into one's desired order seems like the best solution for ChainLadder.

In summary, if the triangle is already a matrix, you are good to go. If the triangle is in long data.frame format, you may have an extra step of reordering the columns. Ultimately, it would be convenient to calculate link ratios (and other ChainLadder algorithms) from a long data.frame without having to first convert it into a matrix -- but that's a different issue.

chiefmurph commented 4 years ago

Or:

within the function as.triangle.data.frame, just before returning, enter the line

matrixTriangle <- matrixTriangle[order(rownames(matrixTriangle)), 
                                 order(colnames(matrixTriangle))]

This will put the matrix into a reasonable default order.

chiefmurph commented 4 years ago

After sleeping on it, that order-ing step won't always work with the character row- and column-names. For example, 10 comes between 1 and 2 in

sort(as.character(1:10)) [1] "1" "10" "2" "3" "4" "5" "6" "7" "8" "9" As long as the origin and development period names are convertible to numeric -- which is usually the case for development "ages" but not always the case with origin labels -- as.numeric would do the trick. That condition could be tested with .allisnumeric in ChainLadder's Triangle.R: rn <- rownames(matrixTriangle) if (.allisnumeric(rn)) rn <- as.numeric(rn) cn <- colnames(matrixTriangle) if (.allisnumeric(cn)) rn <- as.numeric(cn) matrixTriangle <- matrixTriangle[order(rn), order(cn)]

-Dan

SuzieDunham commented 4 years ago

Hey Dan,

Thanks for adding some items! I'm working with some actual data in "long" format, and neither of the ordering solutions are working for me. When I try [,as.character(1:30)] or even as.numeric, I get a subscript out of bounds error. Your first ordering solution ordered the columns, but as characters rather than numeric values as you indicated would happen. For your recent solution, .allisnumeric isn't a recognized function. Any suggestions?

-Susan

marcopark90 commented 4 years ago

HI Suzie,

If you are typing [,as.character(1:30)] and you get the error subscript out of bounds it means that your object has less than 30 columns. Can you please post the output of dput(your_data)? Thank you

SuzieDunham commented 4 years ago

Sorry, I should have said 30 was an example. I am unable to share my data. In this case, this is the code I'm running:

n <- length(unique(New_Loss_Combined$AY_AGE)) test <- New_Loss_Combined %>% filter(PD_CLM_TYPE == '8', PLCY_TYP_CD == 'P', BOOK_TYP_DESC != 'HCW') pd_tl_triange <- as.triangle(test, dev = 'AY_AGE', origin = 'AY', value = 'TMLSS_PD_AMT') nrow(pd_tl_triange) ncol(pd_tl_triange) pd_tl_triange[,as.character(1:n)]

my results:

n [1] 54 nrow(pd_tl_triange) [1] 54 ncol(pd_tl_triange) [1] 54 pd_tl_triange[,as.character(1:n)] Error in pd_tl_triange[, as.character(1:n)] : subscript out of bounds

marcopark90 commented 4 years ago

It's very weird, can you post the output of this: colnames(pd_tl_triangle)?

SuzieDunham commented 4 years ago

colnames(pd_tl_triange) [1] "34.5" "35.5" "36.5" "37.5" "38.5" "39.5" "40.5" "41.5" "42.5" "43.5" "44.5" "45.5" "46.5" "47.5" "48.5" "49.5" [17] "50.5" "51.5" "52.5" "53.5" "33.5" "32.5" "31.5" "30.5" "29.5" "28.5" "27.5" "26.5" "25.5" "24.5" "23.5" "22.5" [33] "21.5" "20.5" "19.5" "18.5" "17.5" "16.5" "15.5" "14.5" "13.5" "12.5" "11.5" "10.5" "9.5" "8.5" "7.5" "6.5" [49] "5.5" "4.5" "3.5" "2.5" "1.5" "0.5"

marcopark90 commented 4 years ago

This will solve your issue: pd_tl_triangle[, as.character( seq(.5,53.5,.5))] However I really recommend you to read about colnames in R. https://stat.ethz.ch/R-manual/R-devel/library/base/html/colnames.html

mages commented 4 years ago

Sorry, to join the party a little late. Can you please try the developer version of ChainLadder on GitHub? I recall that following the last release on CRAN. I fixed as.triangle for a 'long' data set, when input data had missing. See commit https://github.com/mages/ChainLadder/commit/718855836f1efc87beef006904018de26e807e6d

marcopark90 commented 4 years ago

@SuzieDunham In case you are wondering how to do that: library(devtools) install_github("mages/ChainLadder")

trinostics commented 4 years ago

For a non-exported package function, use the three-colon trick:

ChainLadder:::.allisnumeric(c("12", "24")) [1] TRUE

I was hoping you might want to fork the package into your own repo, then you could see the code for yourself, and “borrow” as you see fit. The code is in the file Triangles.R, at the very top. My “hope” is that you could suggest a change that suits you, which probably would suit others, and thereby become a ChainLadder contributor. No pressure! :)

From: SuzieDunham notifications@github.com Sent: Thursday, November 21, 2019 9:25 AM To: mages/ChainLadder ChainLadder@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [mages/ChainLadder] Triangle with empty values in early development periods (#69)

Hey Dan,

Thanks for adding some items! I'm working with some actual data in "long" format, and neither of the ordering solutions are working for me. When I try [,as.character(1:30)] or even as.numeric, I get a subscript out of bounds error. Your first ordering solution ordered the columns, but as characters rather than numeric values as you indicated would happen. For your recent solution, .allisnumeric isn't a recognized function. Any suggestions?

-Susan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mages/ChainLadder/issues/69?email_source=notifications&email_token=ABRJYBYOTKW6QB3YEPCBOULQU276FA5CNFSM4JM6OBG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE3AK6I#issuecomment-557188473, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRJYB5BGPDAS5TOFKFY2BDQU276FANCNFSM4JM6OBGQ.

SuzieDunham commented 4 years ago

Thanks for your assistance. Sadly nothing it working :( I'm either getting a subscript out of bounds error or an incorrectly sorted triangle, so I feel like I'm going to have to abandon trying to use this package. Was just trying to figure out a way to automate generation of a ton of triangles, but I guess I'm stuck with excel!

SuzieDunham commented 4 years ago

Ugh, I'd all but given up but I tried one LAST thing and of course it worked... Though I'm still not thrilled that it had to be this complicated!

Final working and correctly ordered triangle: pd_tl_triange <- as.triangle(test, dev = 'AY_AGE', origin = 'AY', value = 'TMLSS_PD_AMT')

cn <- colnames(pd_tl_triange) pd_tl_triange <- pd_tl_triange[, as.character(sort(as.numeric(cn)))]

trinostics commented 4 years ago

:) There you have it, a solution for your specific data! Nice work. I don't think it's ChainLadder's goal to handle the most general type of data, but to implement cutting edge reserving algorithms. Are there other capabilities of ChainLadder you want to take advantage of, other than calculating link ratios?

Trinostics LLC 925-381-9869


From: SuzieDunham notifications@github.com Sent: Friday, November 22, 2019 10:44:44 AM To: mages/ChainLadder ChainLadder@noreply.github.com Cc: dmurphy trinostics.com dmurphy@trinostics.com; Comment comment@noreply.github.com Subject: Re: [mages/ChainLadder] Triangle with empty values in early development periods (#69)

Ugh, I'd all but given up but I tried one LAST thing and of course it worked... Though I'm still not thrilled that it had to be this complicated!

Final working and correctly ordered triangle: pd_tl_triange <- as.triangle(test, dev = 'AY_AGE', origin = 'AY', value = 'TMLSS_PD_AMT')

cn <- colnames(pd_tl_triange) pd_tl_triange <- pd_tl_triange[, as.character(sort(as.numeric(cn)))]

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mages/ChainLadder/issues/69?email_source=notifications&email_token=ABRJYB63R3U4RDA5B3DSJILQVASBZA5CNFSM4JM6OBG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE6QJGY#issuecomment-557647003, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRJYBZMVGQ5M56Q4AGKBG3QVASBZANCNFSM4JM6OBGQ.