Closed adamssv closed 1 year ago
Hi Scott,
Very interetsing observation! There are subtle reasons for this due to how bal.tab()
finds the original dataset in the matchit
output object. When a propensity score is estimated, the dataset is stored in the propensity score model fit, which is stored in the matchit
object. Otherwise, the dataset is not stored in the matchit
object, and only variables that were used in the matching are stored in the object, so they are all bal.tab()
has access to. Basically, it doesn't know c
lives in df
because df
is not contained the matchit
object anywhere. To tell bal.tab()
where c
is, you need to supply it with the original dataset using the data
argument, i.e.,
bal.tab(mo_user, cluster="c", data = df)
You may wonder how match.data()
knows where the original dataset. It uses a hack that is not always accurate and is less likely to be accurate when using cobalt
rather than MatchIt
alone. But the hack can fail too, and for that reason we recommend using the data
argument with match.data()
, too.
Noah
Hi Noah,
Thanks for your quick and kind response.... makes sense. Sorry if "use the data argument" was kind of obvious.
Indeed, using the data argument explicitly works with the toy example at least.
Hopefully this might help another user as well.
Thanks again,
Scott
Hi, I have been given a very large cohort matched on a previously estimated propensity score (PS) and cannot feasibly re-estimate and rematch. I have the estimated PS and the matched pairs identified.
I would like to be able to examine the balance of various variables according to other categorical variables ("cluster") in the data. (For reasons related to stratifying analysis by a non-treatment variable.)
So, I was hoping to make use of the cobalt tools after calling -matchit-, using the distance parameter to force a user-defined distance.
However, I receive an error with cobalt if I try to look at the balance according to any factor variable (i.e., a "cluster"). I can, however, get -matchit- and -bal.tab- to work fine with a cluster if I generate a dummy propensity score on the same data.
Note the cluster is not including in the distance formula in either case and including it does not make a difference to the error.
The error message is, "Error: The argument to 'cluster' must be a vector of cluster membership or the (quoted) name of a variable in 'data' that contains cluster membership."
Here is example code that generates the error to show what I mean.
I would greatly appreciate any help, sorry if I am missing an obvious way to use -cobalt- in this situation.
Thanks