spgarbet / tangram

Table Grammar package for R
66 stars 3 forks source link

Matched Cohort Transform #50

Open spgarbet opened 4 years ago

spgarbet commented 4 years ago

kylerove has provided a example transform for Matched Cohort studies that follows the template layout of the Hmisc statistics (#48). It copies a lot of boiler plate code to accomplish it's goal.This ticket is to include this as a default transform available with an example and to modify the hmisc transform possibly to handle requests like this with minimal coding.

spgarbet commented 4 years ago

Here is the specification:

1:1 matching
variables          statistical test
-----------------  ----------------
Numeric X Cat      not applicable
Numeric X Numeric  not applicable
Cat X Numeric      paired Student's t-test [stats::t.test(x=covariate, y=arm, paired=TRUE)]
                   Wilcoxon signed rank test [stats::wilcox.test(x=covariate, y=arm, paired=TRUE)]
                      - preferred
                   Cox proportional hazards models stratifying on matched groups
                      [survival::coxph(outcome ~ covariate + strata(block), data = m1.final)]
                      - useful for time to event analysis
Cat X Cat          McNemar's test [stats::mcnemar.test(x=covariate, y=arm)]
                      - this is for 2 x 2 cases only
                      - expects factors
                   Stuart Maxwell chi-squared test [DescTools::StuartMaxwellTest(x=covariate, y=arm)]
                      - this is for 2 x k polytomous covariates, where k ≥ 2
                      - expects factors
#
many:1 matching
# variables          statistical test
-----------------  ----------------
Numeric X Cat      not applicable
Numeric X Numeric  not applicable
Cat X Numeric      logistic regression with generalized estimated equations 
                      [geepack::geeglm(formula = outcome ~ covariate, family = binomial("logit"), data =m2.final, id = block, corstr = "independence", zcor = "zcor")]
                         - outcome must be binary numeric (not a factor)
                         - covariate must be numeric
                         - block must be numeric (not a factor)
                   conditional logistic regression 
                      [survival::clogit(outcome ~ covariate + strata(block), data = m2.final)]
Cat X Cat          Cochran-Mantel-Haenszel chi-squared test
                      [stats::mantelhaen.test(x=covariate, y=arm, z=block)]
                         - strata with only 1 occurence cause errors, there should be check for this
                         - covers 2 x 2 and 2 x >2 polytomous covariates

I favor the following format:
================================================================
          N          0                 1          Test Statistic
                30 patients       30 patients                   
----------------------------------------------------------------
age      60   10.4 (8.8–11.7)   10.1 (8.9–11.2)        0.570     
sex      60       2 (1–2)           2 (1–2)            1.000     
lang     60       2 (1–3)           2 (1–3)            0.470     
opioids  60    2.4 (1.3–3.8)     2.8 (1.6–4.5)         0.640     
los      100  27.3 (18.4–54.6)  32.7 (24.6–45.4)       0.750     
nsaid    60       2 (2–2)           2 (2–2)           < 0.001    
neuro    60       3 (2–4)           2 (1–4)            0.280     
================================================================
spgarbet commented 4 years ago

Recommended design:

Modify the test parameter of the hmisc transform to accept a function that passes in additional arguments and the specified data. This will handle the test statistic. Some example pseudo code:

mytest.catxcat <- function(cdata, rdata, block) { ... }

mytest <- function(row, col, ...)
{
  cdata <- col$data; rdata <- row$data
  if(is.categorical(cdata) && is.categorical(rdata) mytest.catxcat(cdata, rdata, ...)

  if(is.categorical(cdata) && is.numerical(rdata) mytest.catxnum(cdata, rdata, ...)

  stop(paste("Unhandled case for", col$name, "X", row$name))
}

Secondly, the content displayed is IQR when the default is proportion. This is a already overrideable, but will be part of the example.

kylerove commented 4 years ago

Looks great. I updated the specification as follows with refs:

# There are several references that cover this:
# [1]   Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Statist Med 2008; 27: 2037–49. doi:10.1002/sim.3150.
# [2] Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions (3rd edn). Wiley: New York, NY, 2003.
#
# 1:1 matching
#
# variables          statistical test
# -----------------  ----------------
# Numeric x Cat      paired Student's t-test [stats::t.test(x=covariate, y=arm, paired=TRUE)]
#                    Wilcoxon signed rank test [stats::wilcox.test(x=covariate, y=arm, paired=TRUE)]
#                       - preferred
#                    Cox proportional hazards models stratifying on matched groups
#                       [survival::coxph(outcome ~ covariate + strata(block), data = m1.final)]
#                       - useful for time to event analysis
# Cat X Cat          McNemar's test [stats::mcnemar.test(x=covariate, y=arm)]
#                       - this is for 2 x 2 cases only
#                       - expects factors
#                    Stuart Maxwell chi-squared test [DescTools::StuartMaxwellTest(x=covariate, y=arm)]
#                       - this is for 2 x k polytomous covariates, where k ≥ 2
#                       - expects factors
#
#
#
# 1:many matching
#
# variables          statistical test
# -----------------  ----------------
# Numeric x Cat      logistic regression with generalized estimated equations 
#                       [geepack::geeglm(formula = outcome ~ covariate, family = binomial("logit"), data = m2.final, id = block, corstr = "independence", zcor = "zcor")]
#                          - outcome must be binary numeric (not a factor)
#                          - covariate must be numeric
#                          - block must be numeric (not a factor)
#                    conditional logistic regression 
#                       [survival::clogit(outcome ~ covariate + strata(block), data = m2.final)]
# Cat x Cat          Cochran-Mantel-Haenszel chi-squared test
#                       [stats::mantelhaen.test(x=covariate, y=arm, z=block)]
#                          - strata with only 1 occurence cause errors, there should be check for this
#                          - covers 2 x 2 and 2 x >2 polytomous covariates
#
# I favor the following format:
# ==========================================================
#        N        0                  1            Statistic
#               (N=30)             (N=30)                    
# ----------------------------------------------------------
# sex:F  60    22 (73.3%)        22 (73.333%)         —       
# los    60  28.4 (20.4-27.6)  45.3 (17.4-29.8)    < 0.001    
# lang   60                                         0.630     
#    1        12 (40.0%)          8 (26.6%)                 
#    2         6 (20.0%)          7 (23.3%)                 
#    3         6 (20.0%)         10 (33.3%)                 
#    4         6 (20.0%)          5 (16.6%)                 
# nsaid  60   23 (76.7%)         30 (100.0%)        0.023     
# neuro  60                                         0.635     
#    1         7 (23.3%)         10 (33.3%)                 
#    2         7 (23.3%)          6 (20.0%)                 
#    3         2 (6.7%)           5 (16.6%)                 
#    4         9 (30.0%)          6 (20.0%)                 
#    5         5 (16.7%)          3 (10.0%)                 
# ==========================================================
spgarbet commented 4 years ago

While this is still a work in progress, thought I'd show a really simple example.

> library(tangram)
> 
> mytest <- function(row, column, cell_style, ...)
+     cell(hmisc_p(t.test(row$data, column$data)$p.value))
> 
> tangram(bili ~ chol, data=pbc, test=mytest)
==================================
       N    bili    Test Statistic
           (N=418)                
----------------------------------
chol  284  ρ=0.40      P<0.001    
==================================
N is the number of non-missing value. ^1 Kruskal-Wallis. ^2 Pearson. ^3 Wilcoxon.
spgarbet commented 4 years ago

Here's the current working version for Hmisc. This is coming together nicely.

matched-cohort.Rmd.txt

spgarbet commented 4 years ago

The tangram-vignettes project contains the example now.