Use of batchCorrected data in DESeq2

Hi @rialc13 ,

Thanks for the question - I can see it’s a bit confusing. When I ran the initial code for the competition, I ran a batch correction on data that was already batch corrected. But of course this is not the correct way to do it (I kept the code as-is for reproducibility).

As you point out, the correct way would be to run it on the raw data. counts = dat$pbmc_gene_expression$raw_data is not raw count data, it is TPM-corrected data, so you would not need to apply another transform such as vst.

In summary, to do this step properly, if you want to run your own batch correction instead of using the batch-corrected data provided, there are 2 possible ways: 1) start with TPM data tpm=dat$pbmc_gene_expression$raw_data (make sure it is in log space, I can't remember if it is), then run the batch correction limma::removeBatchEffect(tpm, batch=as.vector(dds$dataset)) . This should give you pretty similar results to the dat$pbmc_gene_expression$batchCorrected_data that is already there. 2) start with the original raw count data (this may still be available through the website), run a normalization other than TPM-normalisation, such as vst , then run batch correction limma::removeBatchEffect(assay(vsd), batch=as.vector(dds$dataset)).

I hope this is a bit clearer!

Best,

Nicky

nixstix / CMI-PB2

Use of batchCorrected data in DESeq2 #1