theislab / single-cell-best-practices

https://www.sc-best-practices.org
https://www.sc-best-practices.org
Other
775 stars 182 forks source link

Pseudo-replicates in DE analysis #261

Open emdann opened 8 months ago

emdann commented 8 months ago

The differential expression analysis tutorial recommends to aggregate data from the same donor in pseudo-replicates, if technical replicates from the same donor are not available.

If there are several donors in the single-cell experiment and the user wants to account for the patient varianility, we recommend creating 2 or 3 pseudo-replicates for each patient and including patient information into the design matrix

Should this be presented as best practice? Is there a reference/benchmark to support this approach? I think this advice gives the false idea that technical and inter-individual variability can be disentangled without using technical replicates. Here one could simply use donor as the "sample" for DE analysis, and patient variability would be accounted for when the variance/dispersion between samples is estimated. I am not sure that including the donor as covariate in the model would completely solve the issues with variance estimation from pseudo-replicates.

winglet0996 commented 8 months ago

I have the same doubt as @emdann mentioned here. I tried splitting sample from one donor into 2 replicates resulting much more DE genes detected than not doing the split with DESeq2. So I think there may be a false positive discovery issue here with pseudo-replicates. And including the donor as covariate in the model, especially when not in paired research design, donor may be a nested factor of intervention leading to the design matrix of not full rank. So I’m also wondering when working with data without replicates, how the design matrix can include donor as a covariant.

8df7787e1f10d089d8553b4a9231b3ff