zhangyuqing / ComBat-seq

Batch effect adjustment based on negative binomial regression for RNA sequencing count data
154 stars 39 forks source link

Implementation with DESeq2 > "invalid class “DESeqDataSet” object: the count data is not in integer mode" #19

Open samuellup opened 3 years ago

samuellup commented 3 years ago

Dear Yuqing,

Im working on a RNA-seq meta-analysis and trying to use ComBat-seq to normalize read counts previous to DESeq2 analysis. Im using the following command to normalize the counts within the dds object: assay(dds) <- ComBat_seq(assay(dds), batch=batch, group=NULL)

And then run the following command for the DESeq analysis: dds <- DESeq(dds)

Im obtaining the following error when running the DESeq() function on the normalized counts: "invalid class “DESeqDataSet” object: the count data is not in integer mode". Upon revising the count data it does not seem that I have any non-integer values, but still Im intrigued if I should be using any specific argument in ComBat-seq prior to DESeq2.

A second question I'd like to ask, Im comparing datasets from vastly different genotypes, should I specify the genotypes with the group variable?

Thank you!

Samuel

zhangyuqing commented 3 years ago

I feel this is more of a question about DESeq2 rather than for combat-seq. Though I'm not the expert here, perhaps converting values in count matrix to integers using as.integer() might help.

For the second question, it depends on whether you would like to keep genotype differences in the data after batch adjustment. If you would like to keep them, yes, specify them with either group or covar_mod.

samuellup commented 3 years ago

Thank you for your quick reply Yuqing!

Forgive my lack of knowledge, I am very new to R and RNA-seq data analysis. The as.integer() method is not working on the ComBat_Seq output and I still cant manage to make DESeq2 take the ComBat_seq corrected counts.

I have been reading about batch effect adjustment for a couple of days now and I still am pretty lost. I have a further question if you dont mind, ¿does the ComBat_seq method work for unbalanced batches?.

By unbalanced I mean that I am adjusting for different genotypes sequenced in different batches, most of which are only represented in one batch.

Thank you kindly! Samuel

GRT-coder commented 2 years ago

I had the same problem, this worked from me. I re-builded the Deseqdataset object from the output matrix ComBat_seq() gave me. I hope there is a better solution someday but this is working.

my data

se <- DESeqDataSetFromTximport(txi = txi.sum, colData = coldata, design = ~ cell_type)

ComBat_seq

batch <- se$batch adjusted <- sva::ComBat_seq(assay(se), batch, NULL)

fixing the output matrix

adjusted <- as.data.frame(adjusted) genes <- rownames(adjusted) rownames(adjusted) <-NULL adjusted <- cbind(genes,adjusted)

re-build DESeqdataset object

dds <- DESeqDataSetFromMatrix(countData=adjusted, colData=coldata, design=~cell_type, tidy = TRUE)

Now you can use DESeq analysis:

dds <- DESeq(dds)

schoo7 commented 1 year ago

No 100% sure for my suggestion but you can try. The DEseq2 requires un normalized data. You can input your raw count matrix to DEseq and add "batch" to your design. The batch will be corrected in your DE analysis.

Ilarius commented 1 year ago

I think the problem is that combat gives you very big numeric values that become NAs when converted in integer by DESeq2. There is an unsolved issue about that