Closed susannemay closed 1 month ago
@svteichman What does lm()
do by default here? I'd like for rigr to mimic lm
's behaviour 🙏
From a quick toy example, it seems that lm()
and glm()
will use the variable as a factor if it is input that way, even if the original variable appears to be continuous. Here is an example below:
> df <- data.frame(y = rbinom(20, 1, 0.7), x = 1:20)
> mod <- glm(y ~ x, data = df, family = "binomial")
> coef(mod)
(Intercept) x
0.471398179 -0.006266904
> mod_factor <- glm(y ~ as.factor(x), data = df, family = "binomial")
> coef(mod_factor)
(Intercept) as.factor(x)2 as.factor(x)3 as.factor(x)4 as.factor(x)5 as.factor(x)6
2.556607e+01 -5.113214e+01 2.202667e-06 -5.113214e+01 2.205239e-06 2.201228e-06
as.factor(x)7 as.factor(x)8 as.factor(x)9 as.factor(x)10 as.factor(x)11 as.factor(x)12
-5.113214e+01 2.304767e-06 -5.113214e+01 2.201405e-06 2.201662e-06 -5.113214e+01
as.factor(x)13 as.factor(x)14 as.factor(x)15 as.factor(x)16 as.factor(x)17 as.factor(x)18
-5.113214e+01 2.206395e-06 2.202988e-06 2.307846e-06 -6.726091e-09 -5.113214e+01
as.factor(x)19 as.factor(x)20
2.202157e-06 -5.113214e+01
Thanks @svteichman! So confirming that regress
and lm
are aligned, even if different to STATA?
PS. just confirming the above behaviour still applies even for non-integer x?
Yes!
Here we have a non-integer x
and I directly compare to rigr::regress()
and it gives the same results as glm()
when given the non-integer variable x
coerced into a factor.
> df <- data.frame(y = rbinom(20, 1, 0.7), x = seq(0.1, 2, 0.1))
> mod <- glm(y ~ x, data = df, family = "binomial")
> coef(mod)
(Intercept) x
1.0100427 -0.3660353
> mod_factor <- glm(y ~ as.factor(x), data = df, family = "binomial")
> coef(mod_factor)
(Intercept) as.factor(x)0.2 as.factor(x)0.3 as.factor(x)0.4 as.factor(x)0.5
2.556607e+01 2.307129e-06 2.304752e-06 2.307313e-06 -1.717587e-09
as.factor(x)0.6 as.factor(x)0.7 as.factor(x)0.8 as.factor(x)0.9 as.factor(x)1
-5.113214e+01 -5.113214e+01 -5.113214e+01 -1.399749e-09 2.303245e-06
as.factor(x)1.1 as.factor(x)1.2 as.factor(x)1.3 as.factor(x)1.4 as.factor(x)1.5
-5.113214e+01 -5.113214e+01 9.584600e-08 -1.992438e-09 2.306903e-06
as.factor(x)1.6 as.factor(x)1.7 as.factor(x)1.8 as.factor(x)1.9 as.factor(x)2
9.507097e-08 -5.113214e+01 -5.113214e+01 2.303539e-06 2.303539e-06
> regress_mod_factor <- regress("odds", y ~ as.factor(x), data = df)
Warning messages:
1: In stats::pf(LRStat, p - intercept, n - p) : NaNs produced
2: In stats::qt((1 - conf.level)/2, df = n - p) : NaNs produced
> regress_mod_factor$coefficients[,1]
(Intercept) as.factor(x)0.2 as.factor(x)0.3 as.factor(x)0.4 as.factor(x)0.5
2.556607e+01 2.307129e-06 2.304752e-06 2.307313e-06 -1.717587e-09
as.factor(x)0.6 as.factor(x)0.7 as.factor(x)0.8 as.factor(x)0.9 as.factor(x)1
-5.113214e+01 -5.113214e+01 -5.113214e+01 -1.399749e-09 2.303245e-06
as.factor(x)1.1 as.factor(x)1.2 as.factor(x)1.3 as.factor(x)1.4 as.factor(x)1.5
-5.113214e+01 -5.113214e+01 9.584600e-08 -1.992438e-09 2.306903e-06
as.factor(x)1.6 as.factor(x)1.7 as.factor(x)1.8 as.factor(x)1.9 as.factor(x)2
9.507097e-08 -5.113214e+01 -5.113214e+01 2.303539e-06 2.303539e-06
Thanks @svteichman.
@susannemay even though rigr
does something different to STATA, it does the same thing as R's lm
, which is its natural comparator. For this reason, I don't see this as an issue that needs fixing so I am going to close.
When running a logistic regression model and making the mistake and use a factor(variable), where the variable is continuous (happened as part of a student project), rigr returns estimates when it should not. Below is an example using rigr and what Stata does by contrast.
Please let me know if you need any additional information.
Susanne (Susanne May, sjmay@uw.edu)
STATA output