Open koenvandenberge opened 1 year ago
Hi,
Suppose the two treatments are called "g1" and "g2", the two timepoints are "t1" and "t2"。Denote by alpha1, alpha2, and alpha3 the regression coefficients of x, t and x:t, respectively. Let beta = alpha1 + alpha2 + alpha3.
Then the effect of (g2, t1) is the regression coefficient of x (alpha1), the effect of (g1, t2) is the regression coefficient of t (alpha2), and the effect of (g2, t2) would be the sum of the regression coefficients of x, t and x:t (beta).
Let z-score s=beta/sd(beta), where sd(beta)=sqrt(var(alpha1+alpha2+alpha3))=sqrt(var(alpha1)+var(alpha2)+var(alpha3)+2cov(alpha1,alpha2)+2cov(alpha1,alpha3)+2cov(alpha2,alpha3)). These variances and covariances can be obtained from the LinDA output. Then the significance of beta can be evaluated by the p-value converting from the z-score: p = 2P(Z<=-|s|),where Z is the standard normal distribution.
Thanks.
Thank you for the swift response. I have a couple of clarification questions.
LinDA
does not contain an intercept so I was unsure of the regression coefficients' interpretation. If there were no intercept one would expect alpha1
to correspond to the expected value at treatment in time point 1. If there would be an intercept, it would be the difference (on the linear predictor scale) between treatment and control at timepoint 1, as is what seems to be suggested here. What is the design matrix being used in the LinDA
models? Treatment coding, sum coding, or something else?L = matrix(c(1,0,1),ncol=1)
.I am also interested in a function that compares (g2,t2) to (g1,t2).
Thank you for the swift response. I have a couple of clarification questions.
- The output from
LinDA
does not contain an intercept so I was unsure of the regression coefficients' interpretation. If there were no intercept one would expectalpha1
to correspond to the expected value at treatment in time point 1. If there would be an intercept, it would be the difference (on the linear predictor scale) between treatment and control at timepoint 1, as is what seems to be suggested here. What is the design matrix being used in theLinDA
models? Treatment coding, sum coding, or something else?- Assuming we have an intercept model (treatment coding), the contrast I am interested in is not H0:α1+α2+α3=0 as this would be answering the question whether the expected value is different at (g2,t2) versus (g1,t1), using your notation. I am rather interested in H0:α1+α3=0, comparing (g2,t2) to (g1,t2). Let β=α1+α3, then it seems SD(β)=var(α1)+var(α3)+2cov(α1,α3). Correct?
- Assessing contrasts of linear combinations of the regression coefficients seems like a prominent case that would often be of interest. It may be worthwhile to have a function that implements this for the user, with e.g., a matrix denoting the linear combination(s) of interest as input. In our case it could be
L = matrix(c(1,0,1),ncol=1)
.
baseMean
in the output
dataframe represents 2 to the power of the intercept coefficients (normalized by one million). LinDA fits the linear models on the log2 transformed data. Let v be the vector of the intercept coefficients, then baseMean=2^v / sum(2^v) * 1e6.lm
or lmer
(for mixed models) to fit the linear model. The design matrix of the model y~x*t
includes a vector with all 1 denoting the intercept, a 0-1 vector denoting the status of the treatment, a 0-1 vector denoting the status of the timepoint, and the multiply of these two 0-1 vectors. We can obtain this design matrix by model.matrix(fit)
, where fit=lm(y~x*t)
.I am also interested in a function that compares (g2,t2) to (g1,t2).
The hypothesis for comparing (g2,t2) to (g1,t2) is $H_0: \alpha1+\alpha3=0$, and the hypothesis for comparing (g2,t2) to (g2,t1) is $H_0: \alpha2+\alpha3=0$.
Hi, Sorry that this is quite a basic question, but I am not very familiar with mixed models.
If you take the smokers example dataset and instead formulate the model to have an interaction between Smoke and Sex:
linda.obj <- linda(otu.tab, meta, formula = '~Smoke*Sex+(1|SubjectID)', alpha = 0.1, prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
This is my understanding of what the various outputs are comparing: (?)
With the example model you get all these differentially abundant OTU's from linda.obj$output$Smokey :
row.names(linda.obj$output$Smokey[linda.obj$output$Smokey$reject == "TRUE",]) [1] "3931" "15555" "70671" "74391" "86047" "94166" "149109" "185969" "237323" "239506" "428237" [12] "469920" "470738" "518865" "529659" "570119" "573384"
but with the interaction model, you get 0 differentially abundant OTU's both for Smokey and Smokey:Sexmale. Why is there this difference?
The effect of smoking on the outcome is enhanced (or reduced) when Sex=male compared to when Sex=female by the amount of the regression coefficient of Smokey:Sexmale.
If we look at the joint effect of Smokey and Smokey:Sexmale (difference beween (smoker, male) and (non-smoker, male)), we could discovery some differential taxa:
linda.obj <- linda(otu.tab, meta, formula = '~Smoke*Sex+(1|SubjectID)', alpha = 0.1,
prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)
alp1 <- linda.obj$output$Smokey$log2FoldChange
se1 <- linda.obj$output$Smokey$lfcSE
df1 <- linda.obj$output$Smokey$df
alp2 <- linda.obj$output$Sexmale$log2FoldChange
se2 <- linda.obj$output$Sexmale$lfcSE
df2 <- linda.obj$output$Sexmale$df
alp3 <- linda.obj$output$`Smokey:Sexmale`$log2FoldChange
se3 <- linda.obj$output$`Smokey:Sexmale`$lfcSE
df3 <- linda.obj$output$`Smokey:Sexmale`$df
cov12 <- linda.obj$covariance$Smokey$Sexmale
cov13 <- linda.obj$covariance$Smokey$`Smokey:Sexmale`
cov23 <- linda.obj$covariance$Sexmale$`Smokey:Sexmale`
stat <- (alp1+alp3) / (sqrt(se1^2+se3^2+2*cov13))
pvalue <- 2 * pt(-abs(stat), (df1+df3)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]
[1] "74391" "94166" "149109" "237323" "239506" "428237" "518865"
[8] "570119" "573384
Thanks for the example code - very helpful thanks!
Hi, thanks for this useful software, I'm new in creating mixed-effect models, so I'm still confused about the result of this code "LinDA::linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex+(1|SubjectID)', feature.dat.type = 'count', prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = "BH", alpha = 0.1)" I do not interested in "Smoke+Sex," but focus on "Smoke+Sex", so the otus from " linda.obj$output$Smokey" are while fixing the gender = male or female?
And how can I get this information when I create new models like "linda(taxa, meta, formula = ‘~ Ethnicity(WHITE OR BLACK ) + Antibiotics Use(YES OR NO) + sex(MALE OR FEMALE) + (1|Day); zero.handling = ‘pseudo-count’, feature.dat.type = ‘count’, prev.filter = 0, is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = “BH”, alpha = 0.05)"?
The reject==TRUE otus from " linda.obj$output$Ethnicity" are the different otus while fixing the "Antibiotics Use+ Sex" in which status? "YES & MALE"?
Or they are just the different otus while people are black or white, regardless of people's gender and whether they are taking antibiotics?
thanks for your help ~
The regression coefficient of a variable in a linear model has the meaning that while fixing other variables, increasing one unit of this variable, the response will increase by the value of the coefficient. The "significance" or "rejection" result means that this coefficient is significantly nonzero, in other words, the corresponding variable is significantly associated with the response.
Hi,
When running the
LinDA
workflow, output is provided that focusses on inference of each individual regression coefficient separately. For example, if I'd be having a design with two treatmentsx
each at two timepointst
, the standardLinDA
output using a~x*t
design allows me to perform inference onx
,t
andx:t
. However, say I am interested in assessing the treatment effect in the second timepoint, this would amount to testing the sum of regression coefficientsx + x:t
. How should one go about this when usingLinDA
?