zhangyuqing / ComBat-seq

Batch effect adjustment based on negative binomial regression for RNA sequencing count data
154 stars 39 forks source link

bug in getting adjusted counts matrix #4

Closed davidroad closed 3 years ago

davidroad commented 4 years ago

Hi, I have a dat matrix (1000 X 12). when I am trying to obtain the adjusted_counts using the command "ComBat_seq(dat, batch=batch, group=group)", I will get a list with a length of (1000 X 12). Each list could be one number or a vector of numbers. Can you check this?

batch <- c(rep(1, 6), rep(2, 6)) group <- c(0,0,0,1,1,1,0,0,0,2,2,2) dat[10,] 1533 | 1503 | 1387 | 1258 | 1657 | 1139 | 989 | 935 | 902 | 829 | 913 | 1153 16 | 19 | 19 | 32 | 48 | 17 | 9 | 12 | 10 | 16 | 7 | 7 7 | 216 | 83 | 65 | 30 | 3 | 4 | 4 | 0 | 5 | 5 | 2 1 | 0 | 2 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 54963 | 47549 | 45226 | 39082 | 53703 | 47144 | 16344 | 18281 | 15508 | 16260 | 16299 | 21966 1177 | 1066 | 1079 | 843 | 933 | 930 | 785 | 667 | 805 | 650 | 704 | 710 63 | 65 | 56 | 38 | 45 | 39 | 17 | 12 | 15 | 10 | 20 | 20 165 | 179 | 276 | 233 | 351 | 81 | 78 | 51 | 46 | 57 | 60 | 50 162 | 135 | 163 | 216 | 138 | 136 | 90 | 79 | 78 | 65 | 93 | 79 1809 | 1613 | 1378 | 2227 | 1752 | 1561 | 408 | 482 | 297 | 394 | 311 | 368

zhangyuqing commented 4 years ago

Please use the combat-seq in the development version of sva on Bioconductor: https://www.bioconductor.org/packages/devel/bioc/html/sva.html

davidroad commented 4 years ago

Thanks for the instruction. I installed the sva on R4.0 and still had the problem.

mestato commented 4 years ago

I am also having this problem. My input matrix is 56,044 rows (genes) by 65 columns (read libraries). The libraries are split across 8 batches (experiments). I also have control/treatment labels, which I am putting in groups.

My output from adjusted_counts <- ComBat_seq(count_matrix, batch=batch, group=group)

is a list of 3,642,860 items, some of which are single doubles, some are lists of 15,931 ints. From R Studio:

Screen Shot 2020-05-12 at 9 51 10 AM

Installed sva-devel yesterday. Session info.

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] sva_3.35.2          BiocParallel_1.22.0 genefilter_1.70.0  
[4] mgcv_1.8-31         nlme_3.1-147       
mestato commented 4 years ago

Updated sva with

BiocManager::install(version='devel')
BiocManager::install("sva")

yielding sva_3.37.0 in sessionInfo. Still getting same output.

mestato commented 4 years ago

I think I've found the line in ComBat_seq that creates this problem, at least for my data:

adjust_counts_whole[rm, ] <- countsOri[rm, ]

Early in the function, the genes with all 0 counts across any batch are removed. As far as I can tell, this problem line is at the end and is trying to add those back into the matrix with all the adjusted count genes. Instead of adding them back with their original unadjusted counts, they end up with lists of values.

I'm not sure how to fix, but if you don't care about these genes and only want back the other genes, you can change the function to return adjust_counts instead of adjust_counts_whole. This seems sensible since they aren't being adjusted and are generally lowly expressed.

(Caveat emptor: I'm just trying to use the software, not a developer or sva expert.)

zhangyuqing commented 4 years ago

@mestato @davidroad Thank you for your feedback. Unfortunately, I do not see the issue you are describing from my end. I tried the example from @davidroad, with both the sva package installed on my laptop, and sourcing the combat-seq functions from the latest Bioconductor sva source package (https://www.bioconductor.org/packages/release/bioc/html/sva.html). Both ways, the output is a matrix as expected (see the attached PDF report generated with Rmarkdown).

@mestato I am not sure why removing the 0 count genes will result in a list of values. Running the function on data without genes to be filtered also works fine on my end. Since people use various preprocessing steps, some filter these genes while others don't. Keeping these genes will maintain the original dimension of the count matrix, so that people will not be confused why the dimension of the data changes after adjustment.

If either of you are free to share more information (share the entire dataset and scripts to reproduce the error), please feel free to email them to yuqingz@bu.edu. I'll see what I can do.

test_combatseq.pdf

ejdzi commented 4 years ago

Hi,

I had the same problem, but it turned out that my input was a data frame. as.matrix(input) worked for me.

mestato commented 4 years ago

This worked for me! Thanks ejdzi!