Documentation for how `by = ` handles weighting within categories

mattsecrest commented 7 months ago

Hey @ngreifer ,

First off, kudos on an excellent suite of packages. Had some discussions w/ my colleagues the other day and we could not figure out what weightit(by = ) was doing, with some saying it was fitting PS models in each stratum and others (incl me) thinking the weights were standardized in some way within each stratum. Do you have a reference or some insight you could share? Happy to propose a light PR w/ changes in the roxygen description of by =.

ngreifer commented 7 months ago

Hi Matt, Thank you for the kind words and sorry for the confusion. by is documented in the weightit() documentation. I'll reproduce the relevant text below:

a string containing the name of the variable in data for which weighting is to be done within categories or a one-sided formula with the stratifying variable on the right-hand side. For example, if by = "gender" or by = ~gender, weights will be generated separately within each level of the variable "gender".

Based on this line in the documentation, weighting is done (i.e., weights are generated) separately within each level of the by variable. Nowhere is it mentioned that standardization is done somehow. That means if weighting is to be done using a PS model, a PS model is fit separately within each level of the by variable, and if weighting is done by optimization, the optimization is performed separately within each level of the by variable.

To me this is as clear as it can be, but obviously that comes from my bias as the author. Do you interpret ambiguity here, or did you just miss this line in the documentation? Given my description, how would you change the documentation?

mattsecrest commented 7 months ago

@ngreifer super helpful, thank you for the timely response. For me, the added information that "a PS model is fit separately within each level of the by variable" clears things up perfectly (obviously I have my own bias as well :D). Even that one-sentence add-on I think could be valuable for investigators to read that so that they think of avoiding small N in subgroups. Also, since some MatchIt methods that have an exact argument do not fit new PS models, it would be nice to help draw the distinction in how the different subgroup-balancing methods are applied. Again, happy to take a stab at some text for your review. In the meantime, this definitely answered our question.

ngreifer commented 7 months ago

Glad to help. I kind of wanted to avoid that because PS models are not fit for all weighting methods, so I wanted to use the generic language of "weights are generated". But I understand how adding that line would reduce confusion, so I'll do so in an upcoming version.

ngreifer / WeightIt

Documentation for how `by = ` handles weighting within categories #56