Mixed-model in Limma - Githubissues

KamilMaliszArdigen commented 4 weeks ago

Description of feature

We would like to extend this pipeline logic to allow processing of data and model described bellow. Can you suggest how we should approach this topic?

Consider the following experiment frame: Patient	Condition	Tissue
1	Diseased	A
1	Diseased	B
2	Diseased	A
2	Diseased	B
3	Diseased	A
3	Diseased	B
4	Normal	A
4	Normal	B
5	Normal	A
5	Normal	B
6	Normal	A
6	Normal	B

Mixed-model experiments involve comparisons both within and between subjects. The experiment has six subjects, three with a disease and three normal subjects. Two tissue types, A and B, are examined from each subject. The goal is to compare tissue types and disease states.

Within-subject comparison: Comparing tissue types A and B can be done within subjects since measurements for both tissues are available from each subject.
Between-subject comparison: Comparing diseased and normal subjects requires a between-subject comparison, as these measurements come from different individuals.

Key Concepts for Analyzing Mixed-model Experiments

Random Effects: To account for the variability within and between subjects, the "Patient" variable is treated as a random effect. This means each patient has a unique baseline effect on the expression levels.
RNAseq data with voom: For RNAseq data, the voom function converts read counts to log2-counts per million (log2-cpm) with associated weights, preparing the data for linear modeling in limma. The voom function accounts for the mean-variance relationship in RNAseq data, making it suitable for linear modeling.
duplicateCorrelation Function: This function is used to estimate the correlation between measurements taken from the same subject.
Linear Mixed Model: A linear mixed model is fitted using lmFit, incorporating the estimated inter-subject correlation. This model allows for the simultaneous estimation of fixed effects (tissue type and disease state) and random effects (patient-specific variations).

Steps for Analyzing Mixed-model Experiments in Limma

Combine Factors: Combine experimental factors into a single factor. For example, combine "Condition" (Diseased or Normal) and "Tissue" (A or B) into a combined factor.
Create Design Matrix: Create a design matrix based on the combined factor using model.matrix.
Apply voom Transformation: Use the voom function from limma to transform raw read counts to log2-cpm values.
Estimate Inter-subject Correlation: Use duplicateCorrelation to estimate the correlation between measurements from the same subject.
Fit Linear Mixed Model: Use lmFit to fit the linear model, incorporating the estimated correlation and specifying the "block" argument as the variable representing the subjects (e.g., "Patient").
Define Contrasts: Define contrasts to compare the different experimental conditions, such as diseased versus normal for each tissue type and tissue A versus tissue B for each condition.
Compute Contrasts and Moderated t-tests: Calculate the contrasts and perform moderated t-tests using contrasts.fit and eBayes.
Identify Differentially Expressed Genes: Use topTable to identify differentially expressed genes for each contrast of interest.

Understanding Variability in Mixed-model Experiments

Mixed-model experiments have two levels of variability:

Between-subject variability: The variation between individuals, accounting for inherent differences between subjects.
Within-subject variability: The variation between measurements within the same individual, adjusted for individual baseline differences.

The use of random effects and the duplicateCorrelation function in limma allows for the proper analysis of multi-level experiments, leading to more accurate and reliable results by accounting for both within- and between-subject variability.

References:

pinin4fjords commented 4 weeks ago

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

mzenczak commented 4 weeks ago

Just to be super clear, can you illustrate the differences in how you would construct the models for this please?

@pinin4fjords if I understood you correctly here is the answer to your question:

Following the steps for analyzing mixed models, we first combine the experimental factors into a single factor, as shown in the table below (serving as the sample.sheet). This newly created factor becomes our contrast variable, which we then use in the model construction. See the code snippet below for details:

Patient	Condition	Tissue	Condition.Tissue
1	Diseased	A	Diseased.A
1	Diseased	B	Diseased.B
2	Diseased	A	Diseased.A
2	Diseased	B	Diseased.B
3	Diseased	A	Diseased.A
3	Diseased	B	Diseased.B
4	Normal	A	Normal.A
4	Normal	B	Normal.B
5	Normal	A	Normal.A
5	Normal	B	Normal.B
6	Normal	A	Normal.A
6	Normal	B	Normal.B

DGE <- DGEList(count.table)
DGE <- calcNormFactors(DGE)
contrast_variable <- "Condition.Tissue"
model <- paste("~ 0 +", contrast_variable)
design <- model.matrix(as.formula(model), data = sample.sheet)
voom <- voom(counts = DGE, design = design)
corfit <- duplicateCorrelation(object = voom, design = design, block = Patient)
fit <- lmFit(object = voom, design = design, block = Patient, correlation = corfit$consensus.correlation)

For example, to define a contrast that tests for deferentially expressed genes between diseased tissue A and normal tissue A, you could construct a contrast like this:

contrast <- "Condition.Tissue_Diseased.A_vs_Normal.A = Diseased.A - Normal.A"
contrast_matrix <- makeContrasts(contrasts = contrast, levels = design)
fit <- contrasts.fit(fit, contrast_matrix)
fit <- eBayes(fit)
results <- topTable(fit)

Hope this answers your question. :)

pinin4fjords commented 4 weeks ago

IIUC the contrast can be dealt with by supplying the workflow with a sample sheet containing the composite variable, and building contrasts on that variable in the normal way.

After that the only missing thing is the duplicateCorrelation treatment, which could be PR'd to the limma module as a new feature.

grst commented 3 weeks ago

There's also the DREAM method from the bioconductor package variancePartition which is an extension of limma for linear mixed effects models:

https://gabrielhoffman.github.io/variancePartition/articles/dream.html

It allows to specify random effects using an extension of Wilkinson formulas that is also adopted by other R packages and might be more convenient than what's described in the limma vignette:

form <- ~ Disease + (1 | Individual)

nf-core / differentialabundance

Mixed-model in Limma #325

Description of feature