wenjie2wang / splines2

Regression Spline Functions and Classes
https://wwenjie.org/splines2
GNU General Public License v3.0
42 stars 3 forks source link

issues when internal knot values equal boundary knot values #5

Closed KerollosWanis closed 3 years ago

KerollosWanis commented 3 years ago

When an internal knot has the same value as a boundary knot and degree is set to >0, the bSpline function produces NAs for datapoints at the boundary. For example, for a uniform random variable with support [0,1], if we choose an internal knot at 1 and boundary knots at c(0,1), bSpline (v0.4.3) will produce NAs when evaluating the basis function at any datapoints with the value 1. This is not consistent with the behavior of the bs() function or with older versions of splines2 (I tried v0.2.8).

mihaiconstantin commented 3 years ago

I am curious about this as well because in the past I opened an issue that might be related (see https://github.com/wenjie2wang/splines2/issues/1#issuecomment-661886691).

As you indicated, in earlier versions (e.g., v0.4.1) whenever an internal knot lied on the boundary, bSpline() threw an error. For example, the following code would stop saying that the internal knots must not be on the boundary.

bSpline(1:10, knots = c(1, 5, 10),  degree = 3, intercept = TRUE)

The error message was also thrown when degree = 0.

In v0.4.3 it seems this is not the case anymore. @KerollosWanis can you please say if the following code correctly reproduces the inconsistent behavior you mentioned?

# Data.
x <- seq(10, 100, length.out = 10)

# Creating basis with internal knots that lie on the boundary.
b <- bSpline(x, knots = c(10, 50, 100),  degree = 3, intercept = TRUE)

# Predicting for boundary knots.
predict(b, c(10, 100))

# Which gives:

#      1 2 3   4   5   6   7
# [1,] 0 1 0   0   0   0   0
# [2,] 0 0 0 NaN NaN NaN NaN

And the attributes of the basis matrix used by splines2:::predict.bSpline2 are:

attributes(b)

# $x
# [1]  10  20  30  40  50  60  70  80  90 100
# 
# $degree
# [1] 3
#
# $knots
# [1]  10  50 100
#
# $Boundary.knots
# [1]  10 100
#
# $intercept
# [1] TRUE
#
# $class
# [1] "matrix"   "bSpline2"
wenjie2wang commented 3 years ago

Thanks for reporting this issue.

I do not think setting an internal knot at the boundary can be practically useful. Although splines::bs() does not produce NaN, the first and last columns consist of all zeros for the following example:

library(splines)
bs(1:10, knots = c(1, 5, 10), degree = 3, intercept = TRUE)
      1        2           3          4          5     6 7
 [1,] 0 1.000000 0.000000000 0.00000000 0.00000000 0.000 0
 [2,] 0 0.421875 0.504822531 0.07021605 0.00308642 0.000 0
 [3,] 0 0.125000 0.621913580 0.22839506 0.02469136 0.000 0
 [4,] 0 0.015625 0.505208333 0.39583333 0.08333333 0.000 0
 [5,] 0 0.000000 0.308641975 0.49382716 0.19753086 0.000 0
 [6,] 0 0.000000 0.158024691 0.46617284 0.36780247 0.008 0
 [7,] 0 0.000000 0.066666667 0.34666667 0.52266667 0.064 0
 [8,] 0 0.000000 0.019753086 0.19160494 0.57264198 0.216 0
 [9,] 0 0.000000 0.002469136 0.05728395 0.42824691 0.512 0
[10,] 0 0.000000 0.000000000 0.00000000 0.00000000 1.000 0
attr(,"degree")
[1] 3
attr(,"knots")
[1]  1  5 10
attr(,"Boundary.knots")
[1]  1 10
attr(,"intercept")
[1] TRUE
attr(,"class")
[1] "bs"     "basis"  "matrix"

Internal knots should be placed inside the boundary. A check has been added via 3d0dc97052b004eb596550ff591633381de8c38a.

KerollosWanis commented 3 years ago

Thanks for the prompt response. I also do not think that intentionally setting an internal knot at the boundary can be practically useful, however it might occur for some data generating processes when a user selects knot locations using quantiles of an observed data distribution.

wenjie2wang commented 3 years ago

however it might occur for some data generating processes when a user selects knot locations using quantiles of an observed data distribution.

It makes sense to me.