mclements / Splines2.jl

Spline bases for regression modelling in Julia
MIT License
12 stars 5 forks source link

Splines2.is starts with an all zeros row #2

Closed joeynelson closed 3 years ago

joeynelson commented 4 years ago

Thanks for developing this package, it was very helpful in developing a monotone spline fitting function. One thing I ran into is that the matrix formed starts with a row of all zeros. It isn't a big issue, but it does seem odd.

Because my data doesn't start at zero, I added an all ones column as a constant basis. I'm not sure if there is a better way to handle this. If this is the right approach, it would be nice if the Splines.is function had an option to include a constant basis

mclements commented 4 years ago

Joey,

Would you be able to show some code to demonstrate the problem, please?

The row of zeroes is typical at the left boundary for some of the bases without an intercept. I also tried to be consistent with the splines2 package from R - which may use unexpected defaults.

Kindly, Mark.

joeynelson commented 4 years ago

Mark,

Thanks for the quick response.

I realize now that after I add the ones column, that row is no longer all zeros, and makes good sense.

I also forgot to mention another issue with is, where there is an undefined variable der. I assumed this to be a typo and changed it to ders.

With intercept=true it still has the zeros row, but does look like it shifts the basis' leftwards (upwards in the matrix), but still maintains the zero row. This doesn't make sense to me with the ispline basis.

With the der fix, intercept=false, and the added ones column, it is working well for me. Here is the code I'm testing with.

using Splines2, NonNegLeastSquares, Random, Plots
x = collect(range(0.0, length=101, stop=10.0))
y = 1.0 .- 1.0./(x.+2.0) + randn(size(x)) * 0.005
is1 = Splines2.is_(x,df=32,intercept=false)
X = is1(x)
X1 = hcat(ones(size(x)),X)
b = nonneg_lsq(X1,y)
newx = collect(range(0.0, length=50, stop=10.0))
newX1 = hcat(ones(size(newx)),is1(newx))
yfit = newX1*b
plt = plot(x,y)
plot!(plt,newx,yfit)

Best Regards, Joey

mclements commented 4 years ago

Thank you for the pull request - now accepted.

The implementation for Splines2.is largely followed the implementation for the splines2 package in R, where intercept=FALSE drops the first spline term, and the default intercept=TRUE keeps all of the spline terms. I agree that this does not seem sensible (when would dropping a spline term make sense, and why say there is an intercept term when there is not one?). Note that I had deviated from the splines2 implementation by using intercept=false as the default.

I have now changed the semantics in Spline2, so that the default intercept=false keeps all of the spline terms, while intercept=true includes a column of ones and all of the spline terms. One consequence is that the default splines from the R and Julia implementations will now be the same.

There is one potential issue with this approach, which may explain the splines2 implementation: when would you want to use I-splines with an intercept term? Typically, we use I-splines to model for [0,1] or [0,inf), while an intercept term would shift that domain.

Sincerely, Mark.