tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

Compute standardized mean differences ref #78 #97

Closed tompollard closed 4 years ago

tompollard commented 4 years ago

This change adds an smd argument to compute pairwise standardized mean differences for continuous and categorical variables. Comparisons with the R package are shown below.

PhysioNet 2012 demo data

R

library(tidyverse)
library(tableone)

path = "https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"

pn = read.csv(path)

cols = c("Age","SysABP","Height","Weight","ICU","MechVent","LOS","death")
strata = "MechVent"
tabUnmatched <- CreateTableOne(vars = cols, strata = strata, data = pn, test = FALSE)

print(tabUnmatched, smd = TRUE)

Screen Shot 2020-04-30 at 00 52 08

Python


path = "https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"

df = pd.read_csv(path)

cols = ["Age", "SysABP", "Height", "Weight", "ICU", "MechVent", "LOS", "death"]
categorical=["ICU", "MechVent", "death"]
strata = "MechVent"

t = TableOne(df, categorical=categorical, label_suffix=True, groupby=strata, 
    pval=True, pval_test_name=False, smd=True)

Screen Shot 2020-04-30 at 00 51 48

Right Heart Catheterization Dataset

R

library(tableone)

path = "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/rhc.csv"
df = read.csv(path)

vars <- c("age","sex","race","edu","income","ninsclas","cat1","das2d3pc",
          "dnr1","ca","surv2md1","aps1","scoma1","wtkilo1","temp1","meanbp1",
          "resp1","hrt1","pafi1","paco21","ph1","wblc1","hema1","sod1","pot1",
          "crea1","bili1","alb1","resp","card","neuro","gastr","renal",
          "meta","hema","seps","trauma","ortho","cardiohx","chfhx",
          "dementhx","psychhx","chrpulhx","renalhx","liverhx","gibledhx",
          "malighx","immunhx", "transhx","amihx")

tabUnmatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhc, test = FALSE)

print(tabUnmatched, smd = TRUE)

Screen Shot 2020-04-30 at 01 02 40

Python


path = "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/rhc.csv"
df = pd.read_csv(path)

cols = ["age","sex","race","edu","income","ninsclas","cat1","das2d3pc","dnr1",
          "ca","surv2md1","aps1","scoma1","wtkilo1","temp1","meanbp1","resp1",
          "hrt1","pafi1","paco21","ph1","wblc1","hema1","sod1","pot1","crea1",
          "bili1","alb1","resp","card","neuro","gastr","renal","meta","hema",
          "seps","trauma","ortho","cardiohx","chfhx","dementhx","psychhx",
          "chrpulhx","renalhx","liverhx","gibledhx","malighx","immunhx",
          "transhx","amihx"]

cats = ["race", "income", "ninsclas", "cat1", "dnr1", "ca", "resp",
        "card", "neuro", "gastr", "renal", "meta", "hema", "seps", "trauma",
        "ortho", "sex"]

t = TableOne(df, columns=cols, categorical=cats, label_suffix=True, groupby="swang1", 
             pval=False, pval_test_name=False, smd=True)
t

Screen Shot 2020-04-30 at 01 03 15

jraffa commented 4 years ago

Looks good. I'll take a closer look later. Any thoughts on the +/- compared to R?

tompollard commented 4 years ago

Any thoughts on the +/- compared to R?

Good point, I wasn't sure how best to handle this so it would be interesting to hear your thoughts. It feels like the sign offers some useful information, so I prefer not to remove it unless it is confusing.

I've given the name of the two groups in the column header (unlike R) which makes the sign more meaningful, but it may not be clear to the user that the left group is the control and the right is the treatment (i.e. SMD (0,1) means that 0 is the control and 1 is the treatment).

tompollard commented 4 years ago

While I think of it, I wasn't clear whether Hedges correction was valid for the categorical SMD. It doesn't really matter because the option for applying the correction isn't offered, but it would be good to know.

raheems commented 4 years ago

Thanks for the updates. Is it available yet in PyPI?

tompollard commented 4 years ago

@raheems not yet, but it should be shortly (after a little more testing, and hopefully feedback from @jraffa)

raheems commented 4 years ago

I was able to use the smd feature. Thanks!!