mages / ChainLadder

Claims reserving models in R
https://mages.github.io/ChainLadder/
75 stars 62 forks source link

Function `triangle` #47

Closed vigou3 closed 6 years ago

vigou3 commented 6 years ago

As per the discussion in #46, here is an implementation of a function triangle. It proved very straightforward to make: only four lines of code. Also, thanks to how sapply handles names, it is very natural to specify row and column names as Dan enquired; see the examples in the help page.

(I realized after having committed the changes that I deleted many trailing whitespaces. The only true changes to R/Triangles.R are in lines 91-111. Sorry for the noise.)

I also modified the help page for ?as.triangle to include details and examples on triangle. I took the liberty to remove comments in the file while I was at it.

If you accept the changes, I can also modify the documentation and demo.

mages commented 6 years ago

Thanks! My initial attempt to use the function (without reading the documentation) was

triangle(c(1,2,3,4,5,6))

rather than

triangle(c(1,2,3),c(4,5),6)

because with matrix I would have written

matrix(c(1,2,3,4,5,6), ncol=2)

Was there a specific motivation behind your version?

vigou3 commented 6 years ago

My version allows to have more than one full line at the top of the triangle, something that may happen in practice. Moreover, one may name the vectors of data (the arguments, in this case) and the names get transferred as row names, something I think is pretty nice. (See the help page for examples.)

Given that it doesn't make much sense to build a one row triangle, I could add a provision to the function to use your approach if only one vector of data is provided in argument. Naming the rows would not be supported in this case, but this would allow for a "quick and dirty" use of the function.

mages commented 6 years ago

That makes a lot of sense now. I think, an additional argument nrow=NULL or nrow=NA would be helpful, i.e. if only one vector is provided then it would be cut into a triangle by the nrow given.

trinostics commented 6 years ago

The code is succinct, understandable, and well-commented. I can see how it helps as rbind does not work:

rbind(c(1,2,3), c(1,2), c(1)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 2 1 [3,] 1 1 1 I can't think of many use cases. But one that did occur to me is the entry of a Schedule-P-like triangle by evaluation date (ie, by column). Take this for example 2015 2016 2017 2015 1 2 3 2016 xxxx 1 2 2017 xxxx xxxx 1

Granted, that's not a 'triangle' with "age" columns, but I tried entering the data by "as-of" date and got an error:

triangle(1, c(1, 2), c(1,2,3), bycol = TRUE) Error in 1:nrow(Triangle) : argument of length 0 Perhaps that not an issue b/c it's not a typical use case?

vigou3 commented 6 years ago

That makes a lot of sense now. I think, an additional argument nrow=NULL or nrow=NA would be helpful, i.e. if only one vector is provided then it would be cut into a triangle by the nrow given.

We just need to agree on a user interface when a single vector is provided. Such usage could mean: "here is data for a triangle, put the first n values on the first line, the following n - 1 on the second line, and so on until there is only one value left". In that case, we don't need the number of rows (or columns, let's not forget this option) as we can infer it from the length of the data.

Another usage could be: "here is data for a triangle, arrange it on nrow rows such there are as many full lines as needed at the top". This option is more flexible but somewhat cumbersome to document IMO. We would also need an argument ncol for when bycol = TRUE.

Personally, I'd stick to the first case only to keep things simpler.

Perhaps that not an issue b/c it's not a typical use case?

That's what I'd be inclined to answer.

vigou3 commented 6 years ago

I committed an implementation for the single data vector case for the simplest usage:

> triangle(c(100, 150, 175, 119, 168, 115))
      dev
origin   1   2   3
     1 100 150 175
     2 119 168  NA
     3 115  NA  NA

I worked for a day on the more convoluted implementation that used nrow and ncol arguments, but it proved a mess to try to guess the user's intent. We have an implementation where the intent is clear, so I'd steer users toward using this.