Closed seanv507 closed 7 years ago
You can get interaction terms by setting use.model.frame=TRUE
. Is this sufficient?
No - for precisely the reasons you outline why use.model.frame doesn't work well. That's what I want to work on.
I think it would be a pretty complicated task, since you'd have to parse the formula and figure out what each combination of terms means. Eg how would you handle ~ x1 + x2:(x3 + x4) + x5*x6*x7^2
?
Bear in mind as well that :
and *
can mean different things when variables are factor vs numeric.
Happy to accept a pull request -- I just think that it might be a bigger task than it looks at first.
I was going to limit it to just simple products ie "x1+x2:x3 + x3:x4:x5" ( I am predominantly working with factor variables)
Thanks for the heads up about factor vs numeric - useful to bear in mind.
On Mon, Mar 27, 2017 at 10:17 AM, Hong Ooi notifications@github.com wrote:
I think it would be a pretty complicated task, since you'd have to parse the formula and figure out what each operator means. Eg how would you handle ~ x1 + x2:(x3 + x4) + x5x6x7^2?
Bear in mind as well that : and * can mean different things when variables are factor vs numeric.
Happy to accept a pull request -- I just think that it might be a bigger task than it looks at first.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Hong-Revo/glmnetUtils/issues/11#issuecomment-289384640, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJxLxND_strayIz3Tj5pv1wQeaL4Wr7ks5rp3CogaJpZM4MpYaH .
So I believe your great code makes it easier than you think :) .
rhsTerms <- split(deparse(rhs),' + ')
rhsVars <- all.vars(rhs)
....
matrs <- sapply(rhsTerms, function(x) {
ie since you are already using formulas to create the model variable any formula of the form: f(x) + f(y) will work
2 issues I have come across I now need to handle '~ . + x:y' and I don't know the terms object and what that is for.
draft implementation: 221ed1b6733d79a07cec63c714b50780890eafa2
There is a question over handling of the dot in formulas like ~ . + a:b
or ~ . + sin(x)
. Should .
expand to include main effects that are present in other terms?
How model.matrix/model.frame
handles it:
model.matrix(~ . + a:b)
includes columns for a
and b
, but removes columns in a:b
that are aliased with the main effects (so the overall model matrix is not singular)model.matrix(~ . + sin(x))
includes a column for x
Current implementation will include main effects, but not handle aliasing. This means that ~ . + a:b
will output more columns than model.matrix
does.
@seanv507 would you have any thoughts on this?
unfortunately I am travelling tomorrow and will have infrequent internet access (and am in between jobs so not working on this right now). I presume you mean a*b not a:b ? If so I think that is fine [..for now] I would be happy putting interaction terms explicitly (also because aim is to deal with memory issues of using model.matrix). Aliasing would be a problem! I personally would not use a*b because of that (ie knowing I have to merge coefficient for a from ~ and a from a:b)
No, I mean a:b
. I can either make it so that .
expands to include all variables that are in the formula, or exclude them. Assuming a, b and x are the only variables in the data, the former would be:
~ . + a:b + sin(x) --> a + b + x + a:b + sin(x)
Excluding would be:
~ . + a:b + sin(x) --> a:b + sin(x)
The former seems to be more consistent with the R default. The main issue is that you have to do more work when interpreting interaction coefficients, especially since aliased columns aren't removed. The latter is probably closer to what you want, but less convenient when doing, eg, polynomial regression.
Best of luck with job hunting, if you haven't got one already!
Personally I would prefer former. Sean
On 26 Jul 2017 2:41 am, "Hong Ooi" notifications@github.com wrote:
No, I mean a:b. I can either make it so that . expands to include all variables that are in the formula. Assuming a, b and x are the only variables in the data:
~ . + a:b + sin(x) --> a + b + x + a:b + sin(x)
Or I can exclude them:
~ . + a:b + sin(x) --> a:b + sin(x)
The former seems to be more consistent with the R default. The main issue is that you have to do more work when interpreting interaction coefficients, especially since aliased columns aren't removed. The latter is probably closer to what you want, but less convenient when doing, eg, polynomial regression.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hong-Revo/glmnetUtils/issues/11#issuecomment-317913246, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJxL9POrXsxNac6Z9rfiDweGxvBTFWMks5sRosggaJpZM4MpYaH .
It would be great to add interaction terms of form ..+length:width ...
any problems with implementing this - would be happy to do this myself..