kosukeimai / mediation

R package mediation
58 stars 29 forks source link

Direction of the sign of ACME changes with non-default factor contrast coding #22

Open cbjrobertson opened 4 years ago

cbjrobertson commented 4 years ago

I believe there must be an error which is causing the sign of the reported ACME effect to change depending on the contrast coding matrix for X, when both X->M (a) and M->Y (b) are negative, regardless of how control.value and treat.value are specified in the call to mediate.

In the experimental data during the analysis of which I encountered this issue, X represents a two-contrast factor with english and dutch as the levels. I am interested in setting english as the control and testing for the effect of dutch compared with english. Note that this implies changing the factor contrast coding from the default, because E comes after D, and would usually default as the treatment level.

I simulate these data below:

simulate_data = function(x1){
  # x2, x3, and x4 in a matrix, these will be modified to meet the criteria
  x2 <- scale(matrix( rnorm(length(x1)), ncol=1))

  # put all into 1 matrix for simplicity
  x12 <- cbind(scale(x1),x2)

  # find the current correlation matrix
  c1 <- var(x12)

  # cholesky decomposition to get independence
  chol1 <- solve(chol(c1))

  newx <-  x12 %*% chol1 

  # create new correlation structure (zeros can be replaced with other rvals)
  newc <- matrix(c(1, -0.5,
                   -0.5, 1), 
                 ncol=2)

  chol2 <- chol(newc)

  finalx <- newx %*% chol2 * sd(x1) + mean(x1)

  finalx[,2]
  }

data = data.frame("X"=c(rep('B_english',300),rep('A_dutch',300)),
                  "M"=scale(c(rnorm(300,mean=150, sd = 50),
                                           rnorm(300,mean=100,sd=50))
                    ))
data$Y = simulate_data(data[,'M'])

#set contrasts
data$X_rev = factor(data$X)
contrasts(data$X_rev) = contr.treatment(2,base=2)
colnames(contrasts(data$X_rev)) = 'A_dutch'

data$X = factor(ifelse(data$X == 'B_english','A_english','B_dutch'))

> contrasts(data$X)
          B_dutch
A_english       0
B_dutch         1
> contrasts(data$X_rev)
          A_dutch
A_dutch         1
B_english       0

The only difference between data$X and data$X_rev is the letter they start with. In both of the contrast matrices, the level representing dutch is coded as the treatment contrast, so I would expect that this difference should not result in any substantive changes to model outputs. However this does not appear to be true.

When X = data$X, the sign for ACME is correct (i.e. positive since a and b are both negative):

### X = data$X
med.fit = lm(M ~  X,data = data)
out.fit = lm(Y ~ M + X,data = data)
med.out = mediate(med.fit,out.fit,treat='X',mediator='M',sims=100,
                  control.value = 'A_english', treat.value = 'B_dutch')
>summary(med.out)
Causal Mediation Analysis 
Quasi-Bayesian Confidence Intervals
               Estimate 95% CI Lower 95% CI Upper p-value    
ACME             0.4568       0.3632         0.55  <2e-16 ***
ADE             -0.0764      -0.2247         0.07    0.38    
Total Effect     0.3804       0.2186         0.54  <2e-16 ***
Prop. Mediated   1.2186       0.8517         1.98  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Sample Size Used: 600 
Simulations: 100 

However, in the case of X = data$X_rev, the ACME is reported as negative when it should be positive:

med.fit.rev = lm(M ~  X_rev,data = data)
out.fit.rev = lm(Y ~ M + X_rev,data = data)
med.out.rev = mediate(med.fit.rev,out.fit.rev,treat='X_rev',mediator='M',sims=100,
                  control.value = 'B_english', treat.value = 'A_dutch')
> summary(med.out.rev)
Causal Mediation Analysis 
Quasi-Bayesian Confidence Intervals
               Estimate 95% CI Lower 95% CI Upper p-value    
ACME            -0.4670      -0.5757        -0.35  <2e-16 ***
ADE              0.0787      -0.0637         0.19    0.24    
Total Effect    -0.3883      -0.5335        -0.22  <2e-16 ***
Prop. Mediated   1.2161       0.8620         1.83  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Sample Size Used: 600 
Simulations: 100 

This would seem to be undesirable and/or confusing behaviour. I'm sorry if I missed somewhere in the docs where it specifies that factor contrasts must be coded to the R default (i.e. in alphabetical order). I would suggest that this point either be made more salient in the vignette, or that this behaviour be corrected.

cbjrobertson commented 4 years ago

Bump! Is there going to be a response?