zhouhj1994 / LinDA

33 stars 3 forks source link

Inference on a linear combination of regression coefficients #5

Open koenvandenberge opened 1 year ago

koenvandenberge commented 1 year ago

Hi,

When running the LinDA workflow, output is provided that focusses on inference of each individual regression coefficient separately. For example, if I'd be having a design with two treatments x each at two timepoints t, the standard LinDA output using a ~x*t design allows me to perform inference on x, t and x:t. However, say I am interested in assessing the treatment effect in the second timepoint, this would amount to testing the sum of regression coefficients x + x:t. How should one go about this when using LinDA?

zhouhj1994 commented 1 year ago

Hi,

Suppose the two treatments are called "g1" and "g2", the two timepoints are "t1" and "t2"。Denote by alpha1, alpha2, and alpha3 the regression coefficients of x, t and x:t, respectively. Let beta = alpha1 + alpha2 + alpha3.

Then the effect of (g2, t1) is the regression coefficient of x (alpha1), the effect of (g1, t2) is the regression coefficient of t (alpha2), and the effect of (g2, t2) would be the sum of the regression coefficients of x, t and x:t (beta).

Let z-score s=beta/sd(beta), where sd(beta)=sqrt(var(alpha1+alpha2+alpha3))=sqrt(var(alpha1)+var(alpha2)+var(alpha3)+2cov(alpha1,alpha2)+2cov(alpha1,alpha3)+2cov(alpha2,alpha3)). These variances and covariances can be obtained from the LinDA output. Then the significance of beta can be evaluated by the p-value converting from the z-score: p = 2P(Z<=-|s|),where Z is the standard normal distribution.

Thanks.

koenvandenberge commented 1 year ago

Thank you for the swift response. I have a couple of clarification questions.

mrml500 commented 1 year ago

I am also interested in a function that compares (g2,t2) to (g1,t2).

zhouhj1994 commented 1 year ago

Thank you for the swift response. I have a couple of clarification questions.

  • The output from LinDA does not contain an intercept so I was unsure of the regression coefficients' interpretation. If there were no intercept one would expect alpha1 to correspond to the expected value at treatment in time point 1. If there would be an intercept, it would be the difference (on the linear predictor scale) between treatment and control at timepoint 1, as is what seems to be suggested here. What is the design matrix being used in the LinDA models? Treatment coding, sum coding, or something else?
  • Assuming we have an intercept model (treatment coding), the contrast I am interested in is not H0:α1+α2+α3=0 as this would be answering the question whether the expected value is different at (g2,t2) versus (g1,t1), using your notation. I am rather interested in H0:α1+α3=0, comparing (g2,t2) to (g1,t2). Let β=α1+α3, then it seems SD(β)=var(α1)+var(α3)+2cov(α1,α3). Correct?
  • Assessing contrasts of linear combinations of the regression coefficients seems like a prominent case that would often be of interest. It may be worthwhile to have a function that implements this for the user, with e.g., a matrix denoting the linear combination(s) of interest as input. In our case it could be L = matrix(c(1,0,1),ncol=1).
zhouhj1994 commented 1 year ago

I am also interested in a function that compares (g2,t2) to (g1,t2).

The hypothesis for comparing (g2,t2) to (g1,t2) is $H_0: \alpha1+\alpha3=0$, and the hypothesis for comparing (g2,t2) to (g2,t1) is $H_0: \alpha2+\alpha3=0$.

biouser1992 commented 1 year ago

Hi, Sorry that this is quite a basic question, but I am not very familiar with mixed models.

If you take the smokers example dataset and instead formulate the model to have an interaction between Smoke and Sex:

linda.obj <- linda(otu.tab, meta, formula = '~Smoke*Sex+(1|SubjectID)', alpha = 0.1, prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)

This is my understanding of what the various outputs are comparing: (?)

With the example model you get all these differentially abundant OTU's from linda.obj$output$Smokey :

row.names(linda.obj$output$Smokey[linda.obj$output$Smokey$reject == "TRUE",]) [1] "3931" "15555" "70671" "74391" "86047" "94166" "149109" "185969" "237323" "239506" "428237" [12] "469920" "470738" "518865" "529659" "570119" "573384"

but with the interaction model, you get 0 differentially abundant OTU's both for Smokey and Smokey:Sexmale. Why is there this difference?

zhouhj1994 commented 1 year ago

The effect of smoking on the outcome is enhanced (or reduced) when Sex=male compared to when Sex=female by the amount of the regression coefficient of Smokey:Sexmale.

zhouhj1994 commented 1 year ago

If we look at the joint effect of Smokey and Smokey:Sexmale (difference beween (smoker, male) and (non-smoker, male)), we could discovery some differential taxa:

linda.obj <- linda(otu.tab, meta, formula = '~Smoke*Sex+(1|SubjectID)', alpha = 0.1,
                   prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
alp1 <- linda.obj$output$Smokey$log2FoldChange
se1 <- linda.obj$output$Smokey$lfcSE
df1 <- linda.obj$output$Smokey$df

alp2 <- linda.obj$output$Sexmale$log2FoldChange
se2 <- linda.obj$output$Sexmale$lfcSE
df2 <- linda.obj$output$Sexmale$df

alp3 <- linda.obj$output$`Smokey:Sexmale`$log2FoldChange
se3 <- linda.obj$output$`Smokey:Sexmale`$lfcSE
df3 <- linda.obj$output$`Smokey:Sexmale`$df

cov12 <- linda.obj$covariance$Smokey$Sexmale
cov13 <- linda.obj$covariance$Smokey$`Smokey:Sexmale`
cov23 <- linda.obj$covariance$Sexmale$`Smokey:Sexmale`

stat <- (alp1+alp3) / (sqrt(se1^2+se3^2+2*cov13))
pvalue <- 2 * pt(-abs(stat), (df1+df3)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]
[1] "74391"  "94166"  "149109" "237323" "239506" "428237" "518865"
[8] "570119" "573384
mrml500 commented 1 year ago

Thanks for the example code - very helpful thanks!

linlin0026 commented 1 year ago
  1. Hi, thanks for this useful software, I'm new in creating mixed-effect models, so I'm still confused about the result of this code "LinDA::linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex+(1|SubjectID)', feature.dat.type = 'count', prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = "BH", alpha = 0.1)" I do not interested in "Smoke+Sex," but focus on "Smoke+Sex", so the otus from " linda.obj$output$Smokey" are while fixing the gender = male or female?

  2. And how can I get this information when I create new models like "linda(taxa, meta, formula = ‘~ Ethnicity(WHITE OR BLACK ) + Antibiotics Use(YES OR NO) + sex(MALE OR FEMALE) + (1|Day); zero.handling = ‘pseudo-count’, feature.dat.type = ‘count’, prev.filter = 0, is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = “BH”, alpha = 0.05)"?

The reject==TRUE otus from " linda.obj$output$Ethnicity" are the different otus while fixing the "Antibiotics Use+ Sex" in which status? "YES & MALE"?
Or they are just the different otus while people are black or white, regardless of people's gender and whether they are taking antibiotics?

thanks for your help ~

zhouhj1994 commented 1 year ago

The regression coefficient of a variable in a linear model has the meaning that while fixing other variables, increasing one unit of this variable, the response will increase by the value of the coefficient. The "significance" or "rejection" result means that this coefficient is significantly nonzero, in other words, the corresponding variable is significantly associated with the response.