Open TheChymera opened 7 years ago
Once you have the model and parameter names, then you can specify the constraint/contrasts for the f_test or t_test also with strings, either a comma separated list of equality conditions or a list of strings
"param1 = param2, param2 = param3"
or ["param1 = param2", "param2 = param3"]
where param1, param2 should be the names as they show up for example in params
This is easier if there are just a few constraints, otherwise I usually use np.eye or np.diag or similar to build up a large constraint matrix with -1, +1 and zeros in each row.
params in MixedLM will or might have additional parameters for the random effects. The constraints in f_test, wald_test and t_test needs to have as many columns as the length of params and cov_params.
anova_lm
is currently only implemented for OLS. I don't know whether mixedlm has a likelihood ratio test or a ssr based f_test equivalent.
Ok, so based on what you said, the following should test if all my regressors (except the random effects) are equal (can you confirm this?):
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.read_csv("~/MixedLM_data.csv")
lm = smf.mixedlm("t ~ session", df, groups=df["subject"]).fit()
print(lm.params)
anova = lm.f_test([[1,-1,0,0,0,0],[1,0,-1,0,0,0],[1,0,0,-1,0,0],[1,0,0,0,-1,0]])
print(anova)
This gives me:
Intercept 1.260392
session[T.post] -0.027365
session[T.t1] -0.593442
session[T.t2] -1.243611
session[T.t3] -0.174372
Intercept RE 0.000003
dtype: float64
<F test: F=array([[ 1.86775433]]), p=0.1556140289064265, df_denom=20, df_num=4>
This looks like it could be right, except that I'm worried about how the intercept will be compared to the other regressors.
As I understand it the value of the intercept is with respect to zero, whereas the value of the other regressors is with respect to the intercept. Is f_test()
aware of the fact that the intercept is special in this regard?
f_test doesn't know anything about what the columns mean. There is no special handling of a constant column in this.
[[1,-1,0,0,0,0],[1,0,-1,0,0,0],[1,0,0,-1,0,0],[1,0,0,0,-1,0]] It tests that the differences between the first and the second, third and forth coefficient is zero. That would be correct if you have all levels of the categorical variable.
However, AFAICS you use dummy coding where the first level "t0" (?) is the reference level where the effect is captured by the intercept. In this case, the hypothesis that all levels have the same effect as the reference level would be that session[T.post] to session[T.t3] all have zero coefficients, i.e. just a one at the position of the coefficient and zeros everywhere else in the row. (This would be the same hypothesis as dropping session from the regression.)
IIRC (it's late here):
lmf_test("session[T.post], session[T.t1], session[T.t2], session[T.t3]")
should be equivalent to the 1s at each position.
=0
is the default if an equality is not specified
I think a sequential comparison should be equivalent in this case, i.e. [[0,1,0,0,0,0],[0,1,-1,0,0,0],[0,0,1,-1,0,0],[0,0,0,1,-1,0]]
In this case, the hypothesis that all levels have the same effect as the reference level would be that session[T.post] to session[T.t3] all have zero coefficients, i.e. just a one at the position of the coefficient and zeros everywhere else in the row.
So you mean this should be equivalent to what ANOVA does?:
anova = lm.f_test([[1,0,0,0,0,0]])
No, for all the session params, i.e. the middle 4 coefficients are zero
anova = lm.f_test([[0,1,0,0,0,0], [0,0,1,0,0,0], [0,0,0,1,0,0], [0,0,0,0,1,0]])
and you can check with the string version which should produce the same result.
Thank you for the info. I think it would help greatly to include this info into the documentation and/or examples here.
Is that website generated from this repo? or is it managed elsewhere?
@josef-pkt I would be happy to contribute a relevant example if you can tell me where the source of the documentation is
@josef-pkt why I'm receiving this error?
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-2b50b83a8566> in <module>()
17 md = smf.mixedlm("value ~ nback", data, groups=data["subject"])
18 mdf = md.fit()
---> 19 anova = md.f_test([[0,1,0,0],[0,0,1,0]])
20 print(anova)
21
AttributeError: 'MixedLM' object has no attribute 'f_test'
statsmodels version: '0.8.0.dev0+5797248'
There is no f_test implemented for MixedLM. A PR would be welcome, but note there is a lot of debate about how to do this properly, especially in terms of setting the degrees of freedom.
Kerby
On Thu, Dec 1, 2016 at 7:12 PM, Yichuan Liu notifications@github.com wrote:
@josef-pkt https://github.com/josef-pkt why I'm receiving this error?
---------------------------------------------------------------------------AttributeError Traceback (most recent call last)
in () 17 md = smf.mixedlm("value ~ nback", data, groups=data["subject"]) 18 mdf = md.fit()---> 19 anova = md.f_test([[0,1,0,0],[0,0,1,0]]) 20 print(anova) 21 AttributeError: 'MixedLM' object has no attribute 'f_test' statsmodels version: '0.8.0.dev0+5797248'
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/statsmodels/statsmodels/issues/3262#issuecomment-264336026, or mute the thread https://github.com/notifications/unsubscribe-auth/ACiww8gmmhAVzFKe6-VGxxdIpJdzDarmks5rD2KGgaJpZM4Kq057 .
@yl565 You used the model instance instead of the results instance f_test and wald_test are inherited in the Results instance AFAICS from the code, but they most likely won't work, but should raise a shape mismatch or some other exception. wald_test has a use_t option to switch between t and normal distribution for the test and the corresponding default otherwise it's the same as f_test. However, wald_test requires df_resid, AFAIR, even though it is not used if use_t=False. MixedLMResults defines df_resid, I've just seen.
So wald_test (instead of f_test) could be made to work for MixedLM by using the same overwriting of the inherited method as in t_test.
All models except for the linear regression models like OLS and WLS default to using the asymptotic normal distribution. (default use_t=False is inherited unless overwritten)
I'm adding missing labels, but I'm not sure what the status is.
just an update and a gist (while triaging unlabeled issues)
both f_test
and wald_test
work with the full params length, e.g.
rslt2.f_test(np.eye(len(rslt2.params)))
However, t_test
uses only the fixed effects params (and result in t_test look to be the same as in summary (which is summary2 version), and raises/breaks with full length constrained np.eye(len(rslt2.params))
rslt2.k_fe < len(rslt2.params)
rslt2.t_test(np.eye(rslt2.k_fe))
gist, using a copy-paste of a test case and trying out the test methods https://gist.github.com/josef-pkt/74d76966c7770598bfdb491569808939
Is there a roadmap to add likelihood ratio test for mixedlm models?
I wanted to verify whether among different measurements sessions, any are significantly different from the others. One way to do this would be to use a repeated-measures ANOVA, but I see that the ANOVA implementation in statsmodels does not support this.
I thought another way would be to use MixedLM, which can deal with repeated measures, and then run an ANOVA on that model. I tried two approaches, using this data:
This fails with
AttributeError: 'MixedLMResults' object has no attribute 'ssr'
.This succeeds, but I am unsure whether I understand the
.f_test()
syntax correctly. I would like to test the hypothesis that all sessions are equal. I notice that an array of length 6 is required, though I have only 5 sessions. How come?Can you help me out @kshedden ?