wenjie2wang / splines2

Regression Spline Functions and Classes
https://wwenjie.org/splines2
GNU General Public License v3.0
42 stars 3 forks source link

Implementation for naturalSpline() does not match journal article #17

Closed mclements closed 1 year ago

mclements commented 1 year ago

Great package.

As a possible bug report, I believe that splines2::naturalSpline() does not follow Equation (11) in the Appendix to https://doi.org/10.6339/21-JDS1020. Which is correct - the published formula or the implementation?

I wrote some quick R code to re-implement naturalSpline(), including a flag to use the published formula or, otherwise, to use the implementation in naturalSpline.h:

Hmat = function(B, published=FALSE, standardise=TRUE) {
    Boundary.knots = attr(B,"Boundary.knots")
    knots = attr(B,"knots")
    Aknots = sort(c(rep(Boundary.knots,each=4), knots))
    m = length(knots)
    if (m==0) {
        Ht = matrix(c(3:0,0:3)+0.0,2,4,TRUE)
    } else {
        C = t(splineDesign(x=Boundary.knots,knots=Aknots,derivs=c(2L,2L)))
        p = nrow(C)
        if (m==1)
            Ht = matrix(c(-C[2,1]/C[1,1], 1, 0,0,0,
                          0,-C[3,1]/C[2,1],1,-C[p-2,2]/C[p-1,2],0,
                          0,0,0,1,-C[p-1,2]/C[p,2]), 3,5,TRUE)
        if (m>1) {
            nrow = m+2 # for Ht; spline_df_ variable in NaturalSplines.h
            ncol = m+4
            Ht = matrix(0,nrow,ncol)
            if (published) {
                Ht[,2:(ncol-1)] = diag(nrow)
                Ht[1,1:3] <- Ht[2,2] <- Ht[nrow-1,ncol-1] <- Ht[nrow,ncol-(0:2)] <- 1
                Ht[2,3] = -C[2,1]/C[3,1]
                Ht[nrow-1,ncol-2] = -C[p-1,2]/C[p-2,2]
            } else { # as implemented
                Ht[1,1:3] <- Ht[3,2] <- Ht[4,nrow+1] <- Ht[2,nrow+2-(0:2)] <- 1
                Ht[3,3] = -C[2,1]/C[3,1]
                Ht[4,nrow] = -C[p-1,2]/C[p-2,2]
                if (m>2) {
                    for (index in 4:(nrow-1))
                        Ht[index+1,index] = 1
                }
            }
        }
    }
    if (standardise)
        Ht = Ht / rowSums(Ht)
    return(t(Ht))
}
naturalSpline2 = function(x,df, intercept=TRUE, ...) {
    stopifnot(intercept)
    B = bs(x, df=df+2, intercept=TRUE)
    B %*% Hmat(B, ...)
}

As a check, naturalSpline2 and naturalSpline give similar answers when published=FALSE:

x = seq(0,1,length=11)
for (df in 2:6)
     print(range(naturalSpline2(x,df=df,published=FALSE) - naturalSpline(x,df=df,intercept=TRUE)))
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0 0
[1] 0.000000e+00 6.938894e-18

However, the H matrix is very different from the form of the published formula and from my implementation of the published formula:

t(Hmat(bs(x, df=7, intercept=TRUE), published=FALSE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333
[3,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE,standardise=FALSE))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    0    0    0    0
[2,]    0    1    3    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    0    0    0    0    3    1    0
[5,]    0    0    0    0    1    1    1
t(Hmat(bs(x, df=7, intercept=TRUE), published=TRUE))
          [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]
[1,] 0.3333333 0.3333333 0.3333333    0 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.2500000 0.7500000    0 0.0000000 0.0000000 0.0000000
[3,] 0.0000000 0.0000000 0.0000000    1 0.0000000 0.0000000 0.0000000
[4,] 0.0000000 0.0000000 0.0000000    0 0.7500000 0.2500000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000    0 0.3333333 0.3333333 0.3333333

Is this a bug or a feature?

Sincerely, Mark.

wenjie2wang commented 1 year ago

Thanks for bringing this to my attention! It is a great question.

I reviewed the paper and the code I wrote. In fact, I did the implementation before writing the paper. The arrangement of the H matrix was purely for ease of coding. The resulting spline basis matrices from the code and the paper are different only in terms of the column arrangement and thus should be equivalent for modeling purposes.

I will update the underlying implementation so that it matches the paper. Thanks again for this issue.

mclements commented 1 year ago

Nice! I perhaps should have seen the column re-arrangement from the example -- although it was less obvious from the code.

Thank you for the prompt reply. You can close the issue:).