statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.1k stars 2.88k forks source link

ENH: splines with specific boundary values (predicted at boundaries) #6436

Open josef-pkt opened 4 years ago

josef-pkt commented 4 years ago

I would like to have a spline that starts with predicted value of 0 and ends at predicted value of 1 (outside of x-range) with slope (or asymptotic slope) equal to zero.

example spline fit for reset test, calibration curve #6435

e.g.

res_spline = Logit.from_formula('y ~ bs(linpred, df=4)', {'y': endog, 'linpred': linpred}).fit()
fitted_spline = res_spline.predict()

AFAIR from GAM, patsy cannot do this, but our splines in gam are more flexible in choosing boundary knots. But I don't remember how to do this. In GAM, I added a case where the spline is linear at the boundary knots, outside of support of x data.

I think patsy has options set boundary slope to zero.

I'm not sure how we can impose specific y-values (especially when we already have linear transformation for constant removal)

josef-pkt commented 4 years ago

example: simulated Logit data nobs=200

in plot: black Logit-sigmoid red: lowess with large frac, so the first part of plot does not get pulled up blue: spline, df=4

We need some option to avoid the initial decreasing part in the spline curve. This part is based on very few observations and strongly violates monotonicity.

fig = plt.Figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)

linpred_s = linpred[sort_idx]
ax.plot(linpred_s, endog[sort_idx], '|')
ax.plot(linpred_s, link.inverse(linpred_s), 'k-')
ax.plot(linpred_s, fitted_spline[sort_idx], 'b-')
fig = add_lowess(ax, frac=0.5)
fig

logit_reset_plot_200