zhangyuqing / ComBat-seq

Batch effect adjustment based on negative binomial regression for RNA sequencing count data
154 stars 39 forks source link

ComBat-seq + DESeq2 / WGCNA Or DESeq2 (batch as covariate) / ComBat + WGCNA #18

Open sharvarinarendra opened 3 years ago

sharvarinarendra commented 3 years ago

Hi,

I am using ComBat-seq to remove batch effects from my dataset, and then running DESeq2 on the same. I was wondering if I could use the same data, after rlog transformation, for WGCNA?

Which pipeline would be better (to get both differentially expressed genes and WGCNA results) - 1) ComBat-seq -> DESeq2 -> rlog -> WGCNA 2) DESeq2 (batch as covariate) -> rlog -> ComBat -> WGCNA

Thank you!

Bithorax commented 3 years ago

Hi, I'm currently doing something similar to you. To answer your question I would say that batch correction should be the first step as it requires raw data as input. So my suggestion is to follow the workflow 1.

I do have a question as well. By running "ComBat_seq(Dataset,batch=my_batch)", is the output going to be the dataset corrected by batch effects?

zhangyuqing commented 3 years ago

@Bithorax thanks for your suggestion for the question! Yes, the output will be the dataset corrected by batch effects.

sharvarinarendra commented 3 years ago

Thank you for your answer, @Bithorax and @zhangyuqing !

Bithorax commented 3 years ago

One last question if you can help. I'm not quite sure when I should specify the "group" and hence "full mod=TRUE" parameters. do you have an explanation?

zhangyuqing commented 3 years ago

@Bithorax Both "group" and "covar_mod" refer to any covariates whose signal you would like to keep in your data. So, in differential expression analysis for example, group would be the condition group you are comparing. In addition, if you would like to remain information from any other variables, you can specify them in covar_mod. On the contrary, "batch" is the variable whose signal you would like to remove from the data.

Bithorax commented 3 years ago

Thanks for the explanation. Just a doubt. If specifying "batch" is only removing the batch effect from the dataset, then automatically the signal of my variables of interest are kept. Am I wrong?

zhangyuqing commented 3 years ago

@Bithorax Unfortunately in real data, we can never be 100% sure that only batch effect is removed, because we do not truly know how batch has affected the data, we can only guess. And we are guessing these effects using linear models. In linear models, whether or not you include other signals in the model affects your guess on the batch effect.

If you are familiar with linear regression, perhaps you can think of it simply as the difference between estimating parameters of the 2 models below: data ~ batch data ~ batch + other signals The parameters for batch are what we are guessing, which has different interpretations and values in the two models.

Bithorax commented 3 years ago

Yes, I see your point and I agree. It would be curious to compare the two models to see the difference in the signal. But I guess this also depends on the input dataset.

Thanks for the feedback!

ahdee commented 2 years ago

@zhangyuqing I'm a bit confused about this since it looks like option 1 is recommended? My understanding is that the linear model should be run with uncorrected data with batch as a covariate. The statiscal results can then be merged back with the combat corrected and normalized counts. Can someone please confirm. May be I'm mis undstanding the question somewhere?