markmfredrickson / RItools

Randomization inference tools for R
GNU General Public License v2.0
17 stars 11 forks source link

balanceTest scoping problem #86

Closed josherrickson closed 6 years ago

josherrickson commented 7 years ago

I'm a bit perplexed by this one...

> data(nuclearplants)
> library(optmatch)
> match <- fullmatch(pr ~ cost, data = nuclearplants)
> balanceTest(pr ~ cost + strata(match), data = nuclearplants)
 Error in strata(match) : all arguments must be vectors
> fm <- match
> balanceTest(pr ~ cost + strata(fm), data = nuclearplants)
                 strata       fm          Unstrat        
                 stat   std.diff       z std.diff       z
vars                                                     
cost                     -0.0479 -0.7310  -0.2126 -0.5588
(element weight)             NaN      NA      NaN      NA
> identical(match, fm)
[1] TRUE
benthestatistician commented 7 years ago

Maybe it's getting your match object confused w/ function base::match?

josherrickson commented 7 years ago

I'm assuming its a bug along those lines, but I didn't get a chance to test further yet. If it really is an issue with base::match, I suspect this might be a scoping bug in survival::strata instead of in our code. If so, maybe this is the motivation we need to move beyond using theirs and rolling our own design() or something?

benthestatistician commented 7 years ago

Before rolling our own we might ask Therneau (survival package maintainer) if he'd be willing to patch suvival::strata. My sense is that in other respects it serves our purposes well.

If this hypothesis about the nature of the problem is correct, that is. It'd be nice to have an example that doesn't rely on RItools-specific functions.

josherrickson commented 7 years ago

Looks like its not just in survival::strata.

> library(survival)
> data(ovarian)
> coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian) 
Call:
coxph(formula = Surv(futime, fustat) ~ age + strata(rx), data = ovarian)

      coef exp(coef) se(coef)   z      p
age 0.1374    1.1472   0.0474 2.9 0.0038

Likelihood ratio test=12.7  on 1 df, p=0.000368
n= 26, number of events= 12 
> names(ovarian)
[1] "futime"   "fustat"   "age"      "resid.ds" "rx"       "ecog.ps" 
> names(ovarian)[5] <- "match"
> names(ovarian)
[1] "futime"   "fustat"   "age"      "resid.ds" "match"    "ecog.ps" 
> coxph(Surv(futime, fustat) ~ age + strata(match), data=ovarian) 
Call:
coxph(formula = Surv(futime, fustat) ~ age + strata(match), data = ovarian)

      coef exp(coef) se(coef)   z      p
age 0.1374    1.1472   0.0474 2.9 0.0038

Likelihood ratio test=12.7  on 1 df, p=0.000368
n= 26, number of events= 12 
> 
> data(ovarian)
> names(ovarian)
[1] "futime"   "fustat"   "age"      "resid.ds" "rx"       "ecog.ps" 
> library(optmatch)
The optmatch package has an academic license. Enter relaxinfo() for more information.
> match <- fullmatch(rx-1 ~ ., data = ovarian)
> coxph(Surv(futime, fustat) ~ age + strata(match), data=ovarian)
Call:
coxph(formula = Surv(futime, fustat) ~ age + strata(match), data = ovarian)

     coef exp(coef) se(coef)    z    p
age 0.153     1.165    0.104 1.47 0.14

Likelihood ratio test=2.89  on 1 df, p=0.0889
n= 26, number of events= 12
benthestatistician commented 6 years ago

A few notes...

The problem doesn't require optmatch to be loaded:

library(RItools)
data(nuclearplants)
match <- nuclearplants$pt
balanceTest(pr ~ cost + strata(match), data=nuclearplants)
## Error in strata(match) : all arguments must be vectors

The issue comes about at this call to model.frame in the makeDesigns() code:

 str.fmla <- formula(paste0("factor(", treatment.name, ")", " ~ ", paste0(collapse = "+", c(1, vnames[str.idx]))))
str.tms  <- terms(str.fmla, data = data, specials = c("cluster", "strata"))
str.data <- model.frame(str.tms, data = data, na.action = na.pass, drop.unused.levels=TRUE)

the object str.tms has the environment of str.fmla, which is not the environment of the object passed to balanceTest as argument fmla; in this example that environment is "R_GlobalEnv". The environment of str.fmla has parent frame namespace:RItools. That's where model.frame() is supposed to look next if it can't find a variable in its data argument. I.e., the source of the problem is here.

No time to fix just now, but the fact that the following works shows that the problem isn't tied to using the variable name "match"; it's a broader scoping issue. (If I don't forget I'll rename this task accordingly.)

 nuclearplants$match <- match
 balanceTest(pr ~ cost + strata(match), data=nuclearplants)
##                 strata    match          Unstrat        
##                 stat   std.diff       z std.diff       z
## vars                                                     
## cost                      0.0233  0.0926  -0.2126 -0.5588
## (element weight)             NaN      NA      NaN      NA

I have to turn back to more pressing things now, but I think the next step is to look to see how scoping is being dealt with e.g. in the survival package, if not in base R. (Probably the answer is just to give "tmp.fmla" the environment of fmla, but I/we should check.)

benthestatistician commented 6 years ago

Apparently my fix introduced a new scoping bug of its own. [master 9b68420] addresses it, I'm pretty sure, if perhaps not in the most graceful way.