Open emdann opened 8 months ago
I have the same doubt as @emdann mentioned here. I tried splitting sample from one donor into 2 replicates resulting much more DE genes detected than not doing the split with DESeq2. So I think there may be a false positive discovery issue here with pseudo-replicates. And including the donor as covariate in the model, especially when not in paired research design, donor may be a nested factor of intervention leading to the design matrix of not full rank. So I’m also wondering when working with data without replicates, how the design matrix can include donor as a covariant.
The differential expression analysis tutorial recommends to aggregate data from the same donor in pseudo-replicates, if technical replicates from the same donor are not available.
Should this be presented as best practice? Is there a reference/benchmark to support this approach? I think this advice gives the false idea that technical and inter-individual variability can be disentangled without using technical replicates. Here one could simply use donor as the "sample" for DE analysis, and patient variability would be accounted for when the variance/dispersion between samples is estimated. I am not sure that including the donor as covariate in the model would completely solve the issues with variance estimation from pseudo-replicates.