zhouhj1994 / LinDA

35 stars 3 forks source link

LinDA for longitudinal microbiome analysis #7

Open EOMAK91 opened 1 year ago

EOMAK91 commented 1 year ago

Hi,

I am currently using MassLin2 to analyze my longitudinal microbiome dataset; however, I saw your recent paper describing LinDA and its strengths in comparison to MassLin2. Therefore, I would like to compare my current results to those I would get from LinDA. However, after browsing through your page, I do not see an example of how to use LinDA to analyze longitudinal microbiome data. Could you point me in the direction of a tutorial or webpage that demonstrates how to adapt LinDA for longitudinal analysis?

Thanks in advance.

zhouhj1994 commented 1 year ago

For longitudinal data, one may use the mixed-effect model to incorporate subject as a random effect, since one subject always has multiple observations causing correlations among data. The use of mixed-effect models in LinDA is the same as in linear regression (lm), such as 'formula = '~Smoke+Sex+(1|SubjectID)' in the example in the readme page.

PeterCx commented 1 year ago

Dear @zhouhj1994

I am using LinDA for longitudinal microbiome analysis. I have questions as a few things are not clear to me. This is also related to another issue #5

I have 3 groups (Control, Placebo, Treatment) and 3 timepoints (0, 22, 52). Thus in total there are 18 pairwise comparisons for which I want to know what taxa are differentially abundant.

I have used the following code:

linda.obj <- linda(Species_Filt, Metadata, formula = '~Treatment*Time+(1|Sample)')

alp1 <- linda.obj$output$TreatmentControl$log2FoldChange
se1 <- linda.obj$output$TreatmentControl$lfcSE
df1 <- linda.obj$output$TreatmentControl$df

alp2 <- linda.obj$output$Time22$log2FoldChange
se2 <- linda.obj$output$Time22$lfcSE
df2 <- linda.obj$output$Time22$df

alp3 <- linda.obj$output$`TreatmentControl:Time22`$log2FoldChange
se3 <- linda.obj$output$`TreatmentControl:Time22`$lfcSE
df3 <- linda.obj$output$`TreatmentControl:Time22`$df

cov12 <- linda.obj$covariance$TreatmentControl$Time22
cov13 <- linda.obj$covariance$TreatmentControl$`TreatmentControl:Time22`
cov23 <- linda.obj$covariance$Time22$`TreatmentControl:Time22`

stat <- (alp1+alp3) / (sqrt(se1^2+se3^2+2*cov13))
pvalue <- 2 * pt(-abs(stat), (df1+df3)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

This code should allow me to see if there is a difference between Control and Treatment at Day 22. linda.object$covariance is NULL so I cant complete the calculation.

However, even with this method its not possible to get all pairwise comparisons. For example - Treatment at Day 0 vs Treatment at Day 22 or to check if there is a difference between Control and Placebo at any time.

How should I proceed to get all pairwise comparisons?

Your help is greatly appreciated. Many thanks

Kind regards,

P

zhouhj1994 commented 1 year ago

Thanks for the use of our package.

I am not sure whether your codes will work. If the "Treatment" variable has three levels: Control, Placebo, Treatment, then "Control" will be treated as the reference level (alphabetizing rule), so there shouldn't be such results as "linda.obj$output$TreatmentControl" but "linda.obj$output$TreatmentPlacebo" and "linda.obj$output$TreatmentTreatment" instead.

To get all pairwise comparisons, for example,

  1. Treatment at Day 0 vs Treatment at Day 22: First please make sure that your "Time" variable is categorical (i.e., factor type in R syntax), and note that "0" will be treated as the reference level. Then the following codes would do the task.
    
    alp1 <- linda.obj$output$Time22$log2FoldChange
    se1 <- linda.obj$output$Time22$lfcSE
    df1 <- linda.obj$output$Time22$df

alp2 <- linda.obj$output$TreatmentTreatment:Time22$log2FoldChange se2 <- linda.obj$output$TreatmentTreatment:Time22$lfcSE df2 <- linda.obj$output$TreatmentTreatment:Time22$df

cov12 <- linda.obj$covariance$Time22$TreatmentTreatment:Time22

stat <- (alp1+alp2) / (sqrt(se1^2+se2^2+2cov12)) pvalue <- 2 pt(-abs(stat), (df1+df2)/2) padj <- p.adjust(pvalue, method = 'BH') rownames(linda.obj$output[[1]])[padj <= 0.1]


2. Control and Placebo at any time: If you mean Control and Placebo at the same day, then the differential analysis would be

```r
# Time 0
alp1 <- linda.obj$output$TreatmentPlacebo$log2FoldChange
se1 <- linda.obj$output$TreatmentPlacebo$lfcSE
df1 <- linda.obj$output$TreatmentPlacebo$df

stat <- alp1/se1
pvalue <- 2 * pt(-abs(stat), df1)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

# Time 22
alp2 <- linda.obj$output$TreatmentPlacebo:Time22$log2FoldChange
se2 <- linda.obj$output$TreatmentPlacebo:Time22$lfcSE
df2 <- linda.obj$output$TreatmentPlacebo:Time22$df

cov12 <- linda.obj$covariance$TreatmentPlacebo$TreatmentPlacebo:Time22

stat <- (alp1+alp2) / (sqrt(se1^2+se2^2+2*cov12))
pvalue <- 2 * pt(-abs(stat), (df1+df2)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

# Time 52: similar to Time 22
  1. For other comparisons, the analyses would be similar, just be clear that there would be the following 8 outputs

    TreatmentPlacebo, TreatmentTreatment, Time22, Time52, TreatmentPlacebo:Time22, TreatmentPlacebo:Time52, TreatmentTreatment:Time22, TreatmentTreatment:Time52

    and

    TreatmentPlacebo (can be seen as the baseline Placebo effect, i.e., at time 0), TreatmentTreatment (can be seen as the baseline Treatment effect, i.e., at time 0), Time22 (can be seen as the baseline time 22 effect, i.e., in control group) , Time52 (can be seen as the baseline time 52 effect, i.e., in control group)

PeterCx commented 1 year ago

Thanks for the response. This is very very helpful. However, I cannot find the covariances in the output.

cov12 <- linda.obj$covariance returns NULL

There is no output called covariance in linda.obj. How are you calculating this or what am I missing?

Thanks very much. P

zhouhj1994 commented 1 year ago

Sorry for the late reply.

Are you using the latest version of the package? The old version didn't include the covariance in the output.

Have you tried this example? https://github.com/zhouhj1994/LinDA Can you see the covariances in the output?

Thanks.

PeterCx commented 1 year ago

No problem.

Yes I was using wrong version. I was using the linda function implemented in the microbiomeStat package. I have now installed the LinDA package correctly via devtools and its al working perfectly. Thanks again for all your help.

P

zhouhj1994 commented 1 year ago

Great.

Ivy-ops commented 11 months ago

Thank you for the insightful discussion. I have a question related to this. It would be highly appreciated if you could help me with the explanation of the LinDA results. @zhouhj1994
For example, in @EOMAK91 example, Time has 0,22 and 52. Should we consider the "Time" variable as a continuous variable or as a categorical factor? How does the interpretation differ when "Time" is viewed as continuous versus when it's treated as a factor?

zhouhj1994 commented 11 months ago

If you want to understanding the effect of time, in other words, how is the abundance changed by one unit change of time, you may treat it as continuous. If you are interest in the difference between time 0 and time22/time 52,you may treat it as categorical.

Ivy-ops commented 11 months ago

Hi @zhouhj1994, thank you for the answer.

I've executed the following command: fit <- linda(feature.dat=feature.dat, meta.dat=meta.dat, formula = "~ treatment * time"), where "time" includes the values 0, 22, and 52, and "treatment" includes two levels, "yes" and "no." Could you help me check whether my use of linda output is correct? Much appreciated!

If I consider "time" as a continuous variable: Does "fit$output$time$log2FoldChange" imply that a one-unit change in time results in an x log2FoldChange in each feature? Does "fit$output$treatmentyes$log2FoldChange" mean that, in comparison to the "no" treatment, the "yes" treatment yields an x log2FoldChange in feature.dat?

If I consider "time" as a categorical variable: Does "fit$output$time$time22$log2FoldChange" signify that time22 exhibits an x log2FoldChange in each feature compared to time 0? Does "fit$output$treatmentyes$log2FoldChange" mean that, relative to the "no" treatment, the "yes" treatment results in an x log2FoldChange in feature.dat?

zhouhj1994 commented 11 months ago

Your interpretation is not bad. But you may want to say "...while keeping other variables fixed" or something like that and be careful of the interaction term.

Your model is "treatment + time + treatment * time"

The effect of time: for treatment "no", the effect is "fit$output$time$log2FoldChange", while for treatment "yes", the effect is "fit$output$time$log2FoldChange" + "fit$output$treatmentyes:time$log2FoldChange"

The effect of treatment: while fixing time, the treatment "yes" yields "fit$output$treatmentyes$log2FoldChange" + "fit$output$treatmentyes:time$log2FoldChange" change in comparison to the treatment "no".

The effect of time22: for treatment "no", the effect is "fit$output$time22$log2FoldChange", while for treatment "yes", the effect is "fit$output$time22$log2FoldChange" + "fit$output$treatmentyes:time22$log2FoldChange"

The effect of treatment: for fixed time22, the treatment "yes" yields "fit$output$treatmentyes$log2FoldChange" + "fit$output$treatmentyes:time22$log2FoldChange" change in comparison to the treatment "no".

Ivy-ops commented 11 months ago

Thank you for the correction and the details! @zhouhj1994

PengfanZhang commented 4 months ago

Hi, I have a naive question. What does fit$output$time$log2FoldChange mean when time is a continuous variable? Which two groups are used to calculate the fold change? Thanks!

zhouhj1994 commented 4 months ago

When time is a continuous variable, fit$output$time$log2FoldChange means the effect of time on the response variable (if there are no interactions considered). We might say, a one unit increase in time corresponds to change of fit$output$time$log2FoldChange in response.

zhouhj1994 commented 4 months ago

Here, the response variable is the log2-transformed microbiome abundance.