nignatiadis / SmoothingSplines.jl

Cubic smoothing splines in Julia.
Other
34 stars 14 forks source link

Interpretation of Lambda #26

Closed ayushpatnaikgit closed 2 years ago

ayushpatnaikgit commented 2 years ago

The predicted values for the example seem to be different from the predicted values for the same input to the smooth.spline function in R. Here's how I've done the comparison:

using SmoothingSplines
using RDatasets

cars = dataset("datasets","cars")
X = map(Float64,convert(Array,cars[!, :Speed]))
Y = map(Float64,convert(Array,cars[!, :Dist]))

spl = fit(SmoothingSpline, X, Y, 250.0) # λ=250.0
Ypred = predict(spl) #

using RCall 
R"""
RSmoothingSpline <- function(x, y, lambda) 
{
    m      <- smooth.spline(x = x, y = y, lambda = lambda)
    return <- predict(m, x)$y
}
"""

YpredR = convert(Array{Float64}, rcall(:RSmoothingSpline,x = X, y = Y, lambda = 250))
# The answer is totally different from YPred. The answers are close if I pass lambda = 1/50 into the RSmoothingSpline

It seems the Lambda of the R function isn't the same as that of the Julia function. Is there a way to convert from one to another?

I am building the cross-validation feature for this package for the automatic selection of Lambda. I want to confirm the results with R using RCall. That's why it's important that for a given Lamda, the results are the same.

mauro3 commented 2 years ago

Some lambda discussion happened here https://github.com/nignatiadis/SmoothingSplines.jl/issues/2, maybe of use?

nignatiadis commented 2 years ago

I think the difference is that in the R smooth.spline function the predictor is first transformed to be in the unit interval. Thus, to get same results you could first standardize the predictor:

X = (X .- minimum(X))/(maximum(X)-minimum(X))

If you then repeat your code from above with this new X, then the same choice of lambda (say, 1/50) should yield the same answer using both this package and R's smooth.spline.

Alternatively, if you want to avoid the above transformation, the formula matching the lambdas to each other would be:

λR = λJ / ( maximum(X) - minimum(X) )^3

So e.g., in the example from the Readme, λ equal to 250 / 21^3 ≈ 0.027 for the R function should give the same result.

ayushpatnaikgit commented 2 years ago

Thank you for the answer. I'll close the issue.