Open josef-pkt opened 6 years ago
(random thoughts)
using interaction effect with dummy directly, i.e. we create two variables
z1 = dummy * x
and z0 = (1 - dummy) * x
this means part of the new variable will be zeros. We cannot include those zeros when we create the spline basis, i.e. we would need to have a ignore_zeros
option.
We can do it if we use the spline basis for the x
, in that case we just compute a usual interaction variable on parts of exog.
The two cases above would need 2 options for knot selection when creating the spline basis
by
categorical subsets.The second two might be applicable to regression continuity design and structural change with jump points.
A corresponding option is on penalization or smoothing parameter choice
mgcv allows the common alpha case in the options for GAM, i.e. add an index that maps from alpha list to the list of smooth terms.
related: varying coefficient models which are essentially also just interaction effects. I have not looked at those yet.
two possible example:
one alternative in the structural change, regression discontinuity case: We could add only a mean shift dummy. If we use a local spline basis like B-splines with drop one basis column for removing intercept, then the shape of the curves (including the mean shift variable) in the two segments would be mostly independent of each other, i.e. restricted to common penalization and spill over across knots in transition segment.
just another example: cyclical time series, e.g.
El Nino dataset with annual seasonal pattern, each year is a spline
within day, e.g. hourly, data with weekly cyclical pattern, each day is a spline
pooled model; every cycle is the same, i.e. every year or every day has same pattern this is currently possible
separate model: e.g. estimate each year separately, or estimate data for all weeks for each weekday separately currently possible but no commonality across separate regression
interaction model: each year or each weekday is a separate spline, and eg. parametric intercept or trend common to all sub splines. problems with zeros in dummy/period and spline interaction, we need spline bases that don't include the zero (not it interval) as a value, i.e. the spline basis at zero should be zero.
Our current implementation cannot construct spline bases ignoring some of the observations, e.g. the zero observation. Our current implementation cannot reuse and shift splines, e.g. with equal time interval the time variable is the same for all subunits and we could just shift and transpose it instead of constructing in each time separately (scipy.signal did and maybe still does that)
ideas
R mgcv has a
by
option for the interaction effect of penalized splines and a categorical variables.I don't know yet how we will support this. patsy can create the interaction term using its splines.
two cases (that I can think of)
I think the former should be easy using standard interaction (like kronecker product for unbalanced dummies). The latter will be more difficult because the spline cannot be estimated in the unobserved part. I have a simple example without penalization in the causal impact, synthetic control PR #3647, or in one of the related experimental notebooks.