nf-core / differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
https://nf-co.re/differentialabundance
MIT License
64 stars 37 forks source link

Mixed-model in Limma #325

Open KamilMaliszArdigen opened 4 weeks ago

KamilMaliszArdigen commented 4 weeks ago

Description of feature

We would like to extend this pipeline logic to allow processing of data and model described bellow. Can you suggest how we should approach this topic?

Consider the following experiment frame:


Patient
Condition Tissue
1 Diseased A
1 Diseased B
2 Diseased A
2 Diseased B
3 Diseased A
3 Diseased B
4 Normal A
4 Normal B
5 Normal A
5 Normal B
6 Normal A
6 Normal B


Mixed-model experiments involve comparisons both within and between subjects. The experiment has six subjects, three with a disease and three normal subjects. Two tissue types, A and B, are examined from each subject. The goal is to compare tissue types and disease states.

Key Concepts for Analyzing Mixed-model Experiments

Steps for Analyzing Mixed-model Experiments in Limma

  1. Combine Factors: Combine experimental factors into a single factor. For example, combine "Condition" (Diseased or Normal) and "Tissue" (A or B) into a combined factor.

  2. Create Design Matrix: Create a design matrix based on the combined factor using model.matrix.

  3. Apply voom Transformation: Use the voom function from limma to transform raw read counts to log2-cpm values.

  4. Estimate Inter-subject Correlation: Use duplicateCorrelation to estimate the correlation between measurements from the same subject.

  5. Fit Linear Mixed Model: Use lmFit to fit the linear model, incorporating the estimated correlation and specifying the "block" argument as the variable representing the subjects (e.g., "Patient").

  6. Define Contrasts: Define contrasts to compare the different experimental conditions, such as diseased versus normal for each tissue type and tissue A versus tissue B for each condition.

  7. Compute Contrasts and Moderated t-tests: Calculate the contrasts and perform moderated t-tests using contrasts.fit and eBayes.

  8. Identify Differentially Expressed Genes: Use topTable to identify differentially expressed genes for each contrast of interest.

Understanding Variability in Mixed-model Experiments

Mixed-model experiments have two levels of variability:

  1. Between-subject variability: The variation between individuals, accounting for inherent differences between subjects. 

  2. Within-subject variability: The variation between measurements within the same individual, adjusted for individual baseline differences.

The use of random effects and the duplicateCorrelation function in limma allows for the proper analysis of multi-level experiments, leading to more accurate and reliable results by accounting for both within- and between-subject variability.


References:

  1. https://bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf

  2. https://pmc.ncbi.nlm.nih.gov/articles/PMC4402510/

  3. https://pmc.ncbi.nlm.nih.gov/articles/PMC4937821/

  4. https://pubmed.ncbi.nlm.nih.gov/34605806/



pinin4fjords commented 4 weeks ago

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

mzenczak commented 4 weeks ago

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

@pinin4fjords if I understood you correctly here is the answer to your question:

Following the steps for analyzing mixed models, we first combine the experimental factors into a single factor, as shown in the table below (serving as the sample.sheet). This newly created factor becomes our contrast variable, which we then use in the model construction. See the code snippet below for details:

Patient Condition Tissue Condition.Tissue
1 Diseased A Diseased.A
1 Diseased B Diseased.B
2 Diseased A Diseased.A
2 Diseased B Diseased.B
3 Diseased A Diseased.A
3 Diseased B Diseased.B
4 Normal A Normal.A
4 Normal B Normal.B
5 Normal A Normal.A
5 Normal B Normal.B
6 Normal A Normal.A
6 Normal B Normal.B
DGE <- DGEList(count.table)
DGE <- calcNormFactors(DGE)
contrast_variable <- "Condition.Tissue"
model <- paste("~ 0 +", contrast_variable)
design <- model.matrix(as.formula(model), data = sample.sheet)
voom <- voom(counts = DGE, design = design)
corfit <- duplicateCorrelation(object = voom, design = design, block = Patient)
fit <- lmFit(object = voom, design = design, block = Patient, correlation = corfit$consensus.correlation)

For example, to define a contrast that tests for deferentially expressed genes between diseased tissue A and normal tissue A, you could construct a contrast like this:

contrast <- "Condition.Tissue_Diseased.A_vs_Normal.A = Diseased.A - Normal.A"
contrast_matrix <- makeContrasts(contrasts = contrast, levels = design)
fit <- contrasts.fit(fit, contrast_matrix)
fit <- eBayes(fit)
results <- topTable(fit)

Hope this answers your question. :)

pinin4fjords commented 4 weeks ago

IIUC the contrast can be dealt with by supplying the workflow with a sample sheet containing the composite variable, and building contrasts on that variable in the normal way.

After that the only missing thing is the duplicateCorrelation treatment, which could be PR'd to the limma module as a new feature.

grst commented 3 weeks ago

There's also the DREAM method from the bioconductor package variancePartition which is an extension of limma for linear mixed effects models:

https://gabrielhoffman.github.io/variancePartition/articles/dream.html

It allows to specify random effects using an extension of Wilkinson formulas that is also adopted by other R packages and might be more convenient than what's described in the limma vignette:

form <- ~ Disease + (1 | Individual)