Closed benthestatistician closed 9 years ago
strata
isn't a function in base R. I see a version of it in the survival package, is that the one you're referring to?
Yep, that's the one I meant. As the survival package is recommended, I'd think we could assume in general that it's installed, if not necessarily loaded.
This has been implemented. See test.fullmatch.R#L409 for an example.
This is great stuff! But if I'm reading the example correctly, it demonstrates calls to strata
within match_on
's formula method, not the glm
. I.e., it solves the next problem I was going to ask someone to work on ;-)
It may be that I'm just missing the test for uses of strata
within a PS model, in which case please direct me to it...
My mistake, I thought the gesture towards glm
above was only in how it implemented strata
.
I'm not sure I follow how this should work in this case. If the user doesn't load the survival package, glm
won't take a strata
argument and I don't think we want to get into overloading glm
. If we do add a dependency then the model that is fit is inconsistent.
E.g., if I wanted to exactMatch on ne
and my propensity model is pr ~ t1
, then the following two lines should be identical
match_on(glm(pr ~ t1, data=nuclearplants), within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants)
match_on(glm(pr ~ t1 + strata(ne), data=nuclearplants), data=nuclearplants)
However, this will NOT be the case, as glm(pr ~ t1 + strata(ne))
is equivalent to glm(pr ~ t1 + ne)
, and NOT to glm(pr ~ t1)
.
> mod1 <- glm(pr ~ t1 + strata(ne), data=nuclearplants)
> mod2 <- glm(pr ~ t1 + ne, data=nuclearplants)
> mod3 <- glm(pr ~ t1, data=nuclearplants)
> identical(predict(mod1), predict(mod2))
[1] TRUE
> identical(predict(mod1), predict(mod3))
[1] FALSE
@josherrickson, I think we should be OK with the non-equivalence in your example. It's a best practice, if not necessarily a well-documented one, to add fixed effects to the PS model for whatever variables you're going to exact match on. (You tend to wind up with better balance that way.) The user really ought to be fitting mod1
or mod2
if she intends eventially to stratify on ne
.
I also agree about not overloading glm
. As the survival package is a part of the base R distribution, I can't think of much downside in adding a dependency to it. (Well, I can see a cost: we ought to add tests into our package that strata
is behaving as we expect it to, so that we'll know if the survival people decide to change it.)
For ease of reference, and in case it's helpful, my example of pulling out the strata
terms from a glm
formula is here.
So, in match_on.glm
, these two should be equivalent:
match_on(glm(pr ~ t1 + ne, data=nuclearplants), within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants)
match_on(glm(pr ~ t1 + strata(ne), data=nuclearplants), data=nuclearplants)
Should the same hold true in match_on.formula
? E.g., for the following:
match_on(pr ~ t1 + strata(ne), data=nuclearplants)
Which of these should it be equivalent to?
mod_form1 <- match_on(pr ~ t1, within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants)
mod_form2 <- match_on(pr ~ t1 + ne, within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants)
I believe it should be mod_form1
since this isn't a PS match.
I agree: in this instance and others like it the mod_form1
interpretation
is best.
On Wed, Jun 17, 2015 at 9:36 AM, josherrickson notifications@github.com wrote:
So, in match_on.glm, these two should be equivalent:
match_on(glm(pr ~ t1 + ne, data=nuclearplants), within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants) match_on(glm(pr ~ t1 + strata(ne), data=nuclearplants), data=nuclearplants)
Should the same hold true in match_on.formula? E.g., for the following:
match_on(pr ~ t1 + strata(ne), data=nuclearplants)
Which of these should it be equivalent to?
mod_form1 <- match_on(pr ~ t1, within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants) mod_form2 <- match_on(pr ~ t1 + ne, within=exactMatch(pr ~ ne, data=nuclearplants), data=nuclearplants)
I believe it should be mod_form1 since this isn't a PS match.
— Reply to this email directly or view it on GitHub https://github.com/markmfredrickson/optmatch/issues/101#issuecomment-112801562 .
@benthestatistician Take a look at test.fullmatch.R#L442.
Sorry this took so long; this actual change was fairly trivial given what I had already for match_on.formula
, but it exposed a bug in scores
(it wasn't properly handling a*b
terms) that took a while to overcome - but on the plus-side, I came up with a greatly simplified version of scores.
Nice work! I added a minor change to the error handling when model.frame
with na.action=na.pass
doesn't work, intended to avoid a spurious warning in the event that model.frame
won't go even without the special na.action
argument. Also reworded that warning a little. Change appears not to break anything, but feel free to take a look [master a5f20a7].
In the meantime I'll go ahead and close this out; reopen if needed.
I'm getting an error on the master branch on the tests Josh links to above. Reopening.
Oops. My mistake. I refactored some code that caused this error. Fixed.
In glm formulas one can wrap a nominal independent variable in
strata
, without changing the result:As I understand it, the purpose of the
strata
command is purely to leave signals for methods and functions that will operate on the fitted glm. Let's use it as a signal that the user wanted her propensity score matching to incorporate exact matching on the argument ofstrata
.(Taken together with the feature requested in #86, this would relieve nearly all end users of having to call
exactMatch
directly, or having to muck around directly with ISMs, in order to subdivide matching problems.)