ngreifer / cobalt

Covariate Balance Tables and Plots - An R package for assessing covariate balance
https://ngreifer.github.io/cobalt/
73 stars 12 forks source link

Need help to cite a specific paragraph from the `{cobalt}` documentation. #88

Closed shafayetShafee closed 3 weeks ago

shafayetShafee commented 3 weeks ago

Hello Dr. Greifer,

Thank you for creating and maintaining the package {cobalt}, and for providing such detailed package documentation.

I am using cobalt::bal.tab() to calculate mean differences for a set of covariates. Among these, only one (age) is continuous, while the rest are binary or categorical. For continuous covariates, cobalt::bal.tab() provides standardized mean differences, whereas for binary and categorical covariates, it outputs absolute mean differences.

According to the function documentation:

Balance statistics can be requested with the stats argument. The default balance statistic for mean differences for continuous variables is the standardized mean difference, which is the difference in the means divided by a measure of spread (i.e., a d-type effect size measure). This is the default because it puts the mean differences on the same scale for comparison with each other and with a given threshold. For binary variables, the default balance statistic is the raw difference in proportion. Although standardized differences in proportion can be computed, raw differences in proportion for binary variables are already on the same scale, and computing the standardized difference in proportion can obscure the true difference in proportion by dividing the difference in proportion by a number that is itself a function of the observed proportions.

I would like to include this information in an article I am currently writing and properly cite the relevant details. I have read several articles (Austin 2009, 2011; Austin and Stuart 2015; Zhang et al. 2018) that use standardized mean differences (SMD) for binary covariates as well. That’s why I strongly feel the need to cite a source when using absolute mean difference for binary covariates.

Could you kindly guide me on how to cite the relevant portion from the package documentation? Would the use of the following be sufficient?

> citation("cobalt")
To cite package ‘cobalt’ in publications use:

  Greifer N (2024). _cobalt: Covariate Balance Tables and Plots_. R package version 4.5.5,
  https://github.com/ngreifer/cobalt, <https://ngreifer.github.io/cobalt/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {cobalt: Covariate Balance Tables and Plots},
    author = {Noah Greifer},
    year = {2024},
    note = {R package version 4.5.5, 
https://github.com/ngreifer/cobalt},
    url = {https://ngreifer.github.io/cobalt/},
  }

Additionally, is there any research article that I can cite?

References

Austin, Peter C. 2009. “Balance Diagnostics for Comparing the Distribution of Baseline Covariates Between Treatment Groups in Propensity-Score Matched Samples.” Statistics in Medicine 28 (25): 3083–3107. https://doi.org/https://doi.org/10.1002/sim.3697.

———. 2011. “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.” Multivariate Behavioral Research 46 (3): 399–424. https://doi.org/10.1080/00273171.2011.568786.

Austin, Peter C., and Elizabeth A. Stuart. 2015. “Moving Towards Best Practice When Using Inverse Probability of Treatment Weighting (IPTW) Using the Propensity Score to Estimate Causal Treatment Effects in Observational Studies.” Statistics in Medicine 34 (28): 3661–79. https://doi.org/https://doi.org/10.1002/sim.6607.

Zhang, Zhongheng, Hwa Jung Kim, Guillaume Lonjon, Yibing Zhu, and written on behalf of AME Big-Data Clinical Trial Collaborative Group. 2018. “Balance Diagnostics After Propensity Score Matching.” Annals of Translational Medicine 7 (1). https://atm.amegroups.org/article/view/22865.

ngreifer commented 3 weeks ago

Thanks for the kind words. I don't think this really requires a citation. If you are citing cobalt in your paper, that is attribution enough. Some people cite the vignette; see here for some examples. This isn't such a critical point that you would receive pushback from reviewers about it and should feel the need to justify it with a citation. If you do, you can just report SMDs for binary variables anyway; they should be larger, thereby encouraging you to seek better balance. There is a mention of the option to use raw differences in proportion in section 4.1 of Stuart (2010). Stuart cites Austin (2009) but Austin only recommends SMDs. Also note the KS statistic is equal to the raw difference in proportion for binary variables.

shafayetShafee commented 3 weeks ago

Thank you for your detailed reply. I really appreciate the time and effort you put into answering my question.