wenjie2wang / splines2

Regression Spline Functions and Classes
https://wwenjie.org/splines2
GNU General Public License v3.0
42 stars 3 forks source link

Maybe unexpected number of columns for `bernsteinPoly(.., deriv = 1)`? #11

Closed hhau closed 3 years ago

hhau commented 3 years ago

Hi, thanks for the splines2 package -- I've used it a lot and found it very helpful.

I was hoping you could clarify something for me. I am reading https://www.tandfonline.com/doi/full/10.1080/02664761003692423, particularly the equations on page 963:

The expression for the derivative there seems to be a function of degree + is.integer(intercept) - 1 terms, so I would expect as many columns from the corresponding call to bernsteinPoly(..., derivs = 1). On the other hand, I see that each of the degree + is.integer(intercept) terms in the original BP is differentiable, so I can see why there are degree + is.integer(intercept) columns in the bernsteinPoly(..., derivs = 1) call.

If we look at the basis for a given set of points and knots, then when derivs = 0 I get back what I would expect from a crude basis function generator:

x_seq <- seq(from = 0, to = 1, length.out = 12)
bp_degree <- 4

splines2::bernsteinPoly(
  x = x_seq,
  degree = bp_degree,
  intercept = TRUE,
  Boundary.knots = c(0, 1),
  derivs = 0
)
#>                  1           2          3           4            5
#>  [1,] 1.000000e+00 0.000000000 0.00000000 0.000000000 0.000000e+00
#>  [2,] 6.830135e-01 0.273205382 0.04098081 0.002732054 6.830135e-05
#>  [3,] 4.481251e-01 0.398333447 0.13277782 0.019670788 1.092822e-03
#>  [4,] 2.797623e-01 0.419643467 0.23604945 0.059012363 5.532409e-03
#>  [5,] 1.639915e-01 0.374837784 0.32128953 0.122396011 1.748514e-02
#>  [6,] 8.851854e-02 0.295061813 0.36882727 0.204904037 4.268834e-02
#>  [7,] 4.268834e-02 0.204904037 0.36882727 0.295061813 8.851854e-02
#>  [8,] 1.748514e-02 0.122396011 0.32128953 0.374837784 1.639915e-01
#>  [9,] 5.532409e-03 0.059012363 0.23604945 0.419643467 2.797623e-01
#> [10,] 1.092822e-03 0.019670788 0.13277782 0.398333447 4.481251e-01
#> [11,] 6.830135e-05 0.002732054 0.04098081 0.273205382 6.830135e-01
#> [12,] 0.000000e+00 0.000000000 0.00000000 0.000000000 1.000000e+00

make_col <- function(x, k, M) {
  if (k < 0) return(rep(0, length(x)))
  if (k > M) return(rep(0, length(x)))
  choose(M, k) * (x^(k)) * (1 - x)^(M - k)
}

do.call(cbind, lapply(0 : bp_degree, function(k) make_col(x_seq, k, bp_degree)))
#>               [,1]        [,2]       [,3]        [,4]         [,5]
#>  [1,] 1.000000e+00 0.000000000 0.00000000 0.000000000 0.000000e+00
#>  [2,] 6.830135e-01 0.273205382 0.04098081 0.002732054 6.830135e-05
#>  [3,] 4.481251e-01 0.398333447 0.13277782 0.019670788 1.092822e-03
#>  [4,] 2.797623e-01 0.419643467 0.23604945 0.059012363 5.532409e-03
#>  [5,] 1.639915e-01 0.374837784 0.32128953 0.122396011 1.748514e-02
#>  [6,] 8.851854e-02 0.295061813 0.36882727 0.204904037 4.268834e-02
#>  [7,] 4.268834e-02 0.204904037 0.36882727 0.295061813 8.851854e-02
#>  [8,] 1.748514e-02 0.122396011 0.32128953 0.374837784 1.639915e-01
#>  [9,] 5.532409e-03 0.059012363 0.23604945 0.419643467 2.797623e-01
#> [10,] 1.092822e-03 0.019670788 0.13277782 0.398333447 4.481251e-01
#> [11,] 6.830135e-05 0.002732054 0.04098081 0.273205382 6.830135e-01
#> [12,] 0.000000e+00 0.000000000 0.00000000 0.000000000 1.000000e+00

Now when I compare the result I get using bernsteinPoly(..., derivs = 1) to what I get using the equation for $B'_{M}(x)$ in the paper, I see something quite different.

splines2::bernsteinPoly(
  x = x_seq,
  degree = bp_degree,
  intercept = TRUE,
  Boundary.knots = c(0, 1),
  derivs = 1
)
#>                  1           2          3           4           5
#>  [1,] -4.000000000  4.00000000  0.0000000  0.00000000 0.000000000
#>  [2,] -3.005259204  2.10368144  0.8114200  0.08715252 0.003005259
#>  [3,] -2.190833959  0.73027799  1.1359880  0.30052592 0.024042074
#>  [4,] -1.538692712 -0.19233659  1.0818933  0.56799399 0.081141998
#>  [5,] -1.030803907 -0.73628850  0.7573253  0.81743050 0.192336589
#>  [6,] -0.649135988 -0.97370398  0.2704733  0.97670924 0.375657400
#>  [7,] -0.375657400 -0.97670924 -0.2704733  0.97370398 0.649135988
#>  [8,] -0.192336589 -0.81743050 -0.7573253  0.73628850 1.030803907
#>  [9,] -0.081141998 -0.56799399 -1.0818933  0.19233659 1.538692712
#> [10,] -0.024042074 -0.30052592 -1.1359880 -0.73027799 2.190833959
#> [11,] -0.003005259 -0.08715252 -0.8114200 -2.10368144 3.005259204
#> [12,]  0.000000000  0.00000000  0.0000000 -4.00000000 4.000000000

bp_degree * do.call(cbind, lapply(0 : (bp_degree - 1), function(k) make_col(x_seq, k, bp_degree - 1L)))

#>              [,1]       [,2]       [,3]        [,4]
#>  [1,] 4.000000000 0.00000000 0.00000000 0.000000000
#>  [2,] 3.005259204 0.90157776 0.09015778 0.003005259
#>  [3,] 2.190833959 1.46055597 0.32456799 0.024042074
#>  [4,] 1.538692712 1.73102930 0.64913599 0.081141998
#>  [5,] 1.030803907 1.76709241 1.00976709 0.192336589
#>  [6,] 0.649135988 1.62283997 1.35236664 0.375657400
#>  [7,] 0.375657400 1.35236664 1.62283997 0.649135988
#>  [8,] 0.192336589 1.00976709 1.76709241 1.030803907
#>  [9,] 0.081141998 0.64913599 1.73102930 1.538692712
#> [10,] 0.024042074 0.32456799 1.46055597 2.190833959
#> [11,] 0.003005259 0.09015778 0.90157776 3.005259204
#> [12,] 0.000000000 0.00000000 0.00000000 4.000000000

If I follow the formula for the derivative in the wikipedia article, then I get back the same results as splines2:

bp_degree * do.call(cbind, lapply(0 : (bp_degree), function(k) {
  make_col(x_seq, k - 1, bp_degree - 1L) - make_col(x_seq, k, bp_degree - 1L)
}))

#>               [,1]        [,2]       [,3]        [,4]        [,5]
#>  [1,] -4.000000000  4.00000000  0.0000000  0.00000000 0.000000000
#>  [2,] -3.005259204  2.10368144  0.8114200  0.08715252 0.003005259
#>  [3,] -2.190833959  0.73027799  1.1359880  0.30052592 0.024042074
#>  [4,] -1.538692712 -0.19233659  1.0818933  0.56799399 0.081141998
#>  [5,] -1.030803907 -0.73628850  0.7573253  0.81743050 0.192336589
#>  [6,] -0.649135988 -0.97370398  0.2704733  0.97670924 0.375657400
#>  [7,] -0.375657400 -0.97670924 -0.2704733  0.97370398 0.649135988
#>  [8,] -0.192336589 -0.81743050 -0.7573253  0.73628850 1.030803907
#>  [9,] -0.081141998 -0.56799399 -1.0818933  0.19233659 1.538692712
#> [10,] -0.024042074 -0.30052592 -1.1359880 -0.73027799 2.190833959
#> [11,] -0.003005259 -0.08715252 -0.8114200 -2.10368144 3.005259204
#> [12,]  0.000000000  0.00000000  0.0000000 -4.00000000 4.000000000

Created on 2021-09-21 by the reprex package (v2.0.1)

I expect this is a misunderstanding on my part (or an incorrect formula in the paper?), but how many terms/columns should the derivative of a BP have? The appeal of the expression in the aforementioned paper is that the expressions for the 0th/1st/2nd derivative are all written in terms of the same coefficients. This is very useful when one observes the derivative at the same time as the actual function (but with different noise), and one wishes to constrain the derivative to be monotonically increasing.

wenjie2wang commented 3 years ago

Hi @hhau

Thanks for the detailed question.

I expect this is a misunderstanding on my part (or an incorrect formula in the paper?), but how many terms/columns should the derivative of a BP have?

The derivatives of the Bernstein polynomial basis functions have a one-to-one correspondence to the original basis functions: the first derivative of the jth basis function (given in the jth column of the basis matrix) is given in the jth column of the output matrix when derivs = 1. Therefore, the number of basis functions should be the same regardless of the order of derivatives.

The appeal of the expression in the aforementioned paper is that the expressions for the 0th/1st/2nd derivative are all written in terms of the same coefficients. This is very useful when one observes the derivative at the same time as the actual function (but with different noise), and one wishes to constrain the derivative to be monotonically increasing.

Due to the one-to-one correspondence, the same set of coefficients should be applied to the basis functions and their derivatives.

The key to your question is that the formula in Curtis and Ghosh (2011) is for the polynomial function instead of the basis functions. See the following example for illustration.

x_seq <- seq(from = 0, to = 1, length.out = 12)
bp_degree <- 4

set.seed(123)
(beta_vec <- runif(bp_degree + 1))
#> [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673

make_col <- function(x, k, M) {
  if (k < 0) return(rep(0, length(x)))
  if (k > M) return(rep(0, length(x)))
  choose(M, k) * (x^(k)) * (1 - x)^(M - k)
}

## method 1: following the formula in Curtis and Ghosh (2011)
d1 <- rowSums(bp_degree * sapply(0 : (bp_degree - 1), function(k) {
    diff(beta_vec)[k + 1] * make_col(x_seq, k, bp_degree - 1L)
}))

## method 2: use splines2
dmat <- splines2::bernsteinPoly(
  x = x_seq,
  degree = bp_degree,
  intercept = TRUE,
  Boundary.knots = c(0, 1),
  derivs = 1
)
(d2 <- c(dmat %*% beta_vec))
#>  [1] 2.0029105 1.2057335 0.6982206 0.4262160 0.3355642 0.3721093 0.4816958
#>  [8] 0.6101680 0.7033701 0.7071465 0.5673416 0.2297995

## verify the equivalence
all.equal(d1, d2)
#> [1] TRUE

Created on 2021-09-21 by the reprex package (v2.0.1)

The Bernstein polynomial basis functions are equivalent to the B-splines without internal knots. For shape-restricted regression, using I-splines or C-splines is easier and more flexible. You may be interested in the paper that we recently wrote: https://doi.org/10.6339/21-JDS1020.

hhau commented 3 years ago

Hi @wenjie2wang -- a big thank you for the in-depth response. I now see the correspondence between the basis and its derivatives. Thank you for the link to the paper, I'm sure I will consult it (particularly section 4.1) numerous times in the future.

You have definitely answered my question (so we could close this issue), but I have two more questions that are entirely for my own interest:

  1. My initial thought was to parameterise the problem in terms of a different set of coefficients u = diff(beta_vec), estimate the values of u (subject to the appropriate shape constraint) by utilising the observations of the derivative, and then somehow reverse engineer the values of beta_vec (adding in an additional component to ensure that beta_vec had the correct number of components) when fitting the original data (observations of the 0th derivative, for lack of better terminology). Clearly this strategy is flawed, but I can't see where the logical inconsistency arises.
  2. There are quite a number of configurations of x values that lead to second derivatives with poor numerical properties (excuse the ugly code):

    library(magrittr)
    library(splines2)
    
    replicate(
      n = 5e4,
      expr = {
        det(crossprod(bernsteinPoly(
          x = runif(n = 12), 
          degree = 5,
          intercept = FALSE,
          Boundary.knots = c(0, 1),
          derivs = 2
        )))
      }
    ) %>% 
      magrittr::equals(0.0) %>% 
      sum() %>% 
      divide_by(5e4)
    #> [1] 0.06062

    Created on 2021-09-22 by the reprex package (v2.0.1)

    I guess this is to do with the Bernstein polynomial basis being more mathematically interesting than optimised for numerical computing, but ~6% of basis matrices being numerically singular strikes me as a lot? I don't really have a question here, I am just wondering if this has ever caused you an issue in 'real world' applications.

wenjie2wang commented 3 years ago

a big thank you for the in-depth response. I now see the correspondence between the basis and its derivatives. Thank you for the link to the paper, I'm sure I will consult it (particularly section 4.1) numerous times in the future.

My pleasure.

  1. My initial thought was to parameterise the problem in terms of a different set of coefficients u = diff(beta_vec), estimate the values of u (subject to the appropriate shape constraint) by utilising the observations of the derivative, and then somehow reverse engineer the values of beta_vec (adding in an additional component to ensure that beta_vec had the correct number of components) when fitting the original data (observations of the 0th derivative, for lack of better terminology). Clearly this strategy is flawed, but I can't see where the logical inconsistency arises.

At least one beta estimate is required to reverse engineer the values of beta_vec from u, right? If one beta parameter is considered in addition to u, it will probably be an equivalent parameterization.

  1. There are quite a number of configurations of x values that lead to second derivatives with poor numerical properties

I was not aware of that. I do not have real-world applications where the function is directly estimated by the second derivatives. But thanks for pointing it out.