kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Question about Error "diff # of rows..." when running dmrseq #58

Closed KristenKrolick closed 7 months ago

KristenKrolick commented 7 months ago

Hello,

Thank you so much for your tool! Can you please help me figure out why I am getting the error "Error in data.frame(g.fac = rep(pDat[, colnames(design)[coeff[1]]], each = length(ix)), : arguments imply differing number of rows: 672, 654" when trying to run dmrseq? I think it may be because the way I filtered out zeros from samples, I did not know the code for how to do it per group. Can you give me example code of how to do it per group?

Or do you think the error returned is for a different reason? I can not figure out where the numbers 672 and 654 are coming from that is being returned in the error... It does not seem like it had to actually do with my covariate data inside the colData object of my BSseq object...

Thank you, Kristen

My code:

library(bsseq)
library(dmrseq)
require(BiocGenerics)

#reading in the files

metadata <- read.csv("/data/path/same112_OPRM1_Annot.csv")
files <-list.files("/data/path/bismarkcovfiles")
setwd("/data/path/bismarkcovfiles")

bismarkBSseq <- read.bismark(files,
                             loci = NULL,
                             colData = metadata,
                             rmZeroCov = TRUE,
                             strandCollapse = TRUE,
                             verbose = TRUE)

#Selecting chr1-22 and M (in order to filter out X and Y and unknown/random chromosomes)
chr1_22_M_bismarkBSseq <- chrSelectBSseq(bismarkBSseq, seqnames = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrM"), order = TRUE)

#remove all CpGs that have no coverage in at least one sample, from dmrseq user guide
loci.idx <- which(DelayedMatrixStats::rowSums2(getCoverage(chr1_22_M_bismarkBSseq, type ="Cov")==0, useNames= TRUE) ==0)

chr1_22_M_bismarkBSseq.filtered <- chr1_22_M_bismarkBSseq[loci.idx, ]

testCovariate <- "UpdatesCPSP2023_VCfinal"
adjustCovariate <- c("sex.x","age.x","race.x.num", "Mono_3", "NK_3", "CD4_TOTAL", "CD8_TOTAL", "GRANULOCYTES_TOTAL", "B_TOTAL")
regions <- dmrseq(bs=chr1_22_M_bismarkBSseq.filtered,
                  cutoff = 0.05,
                  testCovariate= testCovariate,
                  adjustCovariate = adjustCovariate)
#Error returned:
Assuming the test covariate UpdatesCPSP2023_VCfinal is a factor.
Condition: 10 vs 01
Adjusting for covariate (s): sex.x, age.x, race.x.num, Mono_3, NK_3, CD4_TOTAL, CD8_TOTAL, GRANULOCYTES_TOTAL, B_TOTAL
Parallelizing using 54 workers/cores (backend: BiocParallel:MulticoreParam).
Computing on 1 chromosome(s) at a time.

Detecting candidate regions with coefficient larger than 0.05 in magnitude.
...Chromosome chr1: Smoothed (1 min). Error in (function (cond)  : 
                                                  error in evaluating the argument 'args' in selecting a method for function 'do.call': BiocParallel errors
                                                6 remote errors, element index: 1, 2, 3, 4, 5, 6
                                                0 unevaluated and other errors
                                                first remote error:
                                                  Error in data.frame(g.fac = rep(pDat[, colnames(design)[coeff[1]]], each = length(ix)), : arguments imply differing number of rows: 672, 654

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/R/4.2.1/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/4.2.1/lib64/R/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
[6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
  [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] dmrseq_1.18.1               DSS_2.46.0                  BiocParallel_1.32.4         bsseq_1.34.0                SummarizedExperiment_1.28.0
[6] Biobase_2.58.0              MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.1        GenomeInfoDb_1.34.9        
[11] IRanges_2.32.0              S4Vectors_0.36.1            BiocGenerics_0.44.0        

loaded via a namespace (and not attached):
  [1] colorspace_2.1-0              rjson_0.2.21                  ellipsis_0.3.2                XVector_0.38.0                rstudioapi_0.14              
[6] bit64_4.0.5                   interactiveDisplayBase_1.36.0 AnnotationDbi_1.60.0          fansi_1.0.4                   xml2_1.3.3                   
[11] codetools_0.2-18              splines_4.2.1                 R.methodsS3_1.8.2             sparseMatrixStats_1.10.0      cachem_1.0.6                 
[16] Rsamtools_2.14.0              dbplyr_2.2.1                  png_0.1-8                     R.oo_1.25.0                   shiny_1.7.3                  
[21] HDF5Array_1.26.0              BiocManager_1.30.19           readr_2.1.3                   compiler_4.2.1                httr_1.4.4                   
[26] assertthat_0.2.1              Matrix_1.5-3                  fastmap_1.1.0                 limma_3.54.0                  cli_3.6.1                    
[31] later_1.3.0                   htmltools_0.5.4               prettyunits_1.1.1             tools_4.2.1                   gtable_0.3.3                 
[36] glue_1.6.2                    GenomeInfoDbData_1.2.9        annotatr_1.24.0               reshape2_1.4.4                dplyr_1.1.1                  
[41] doRNG_1.8.2                   rappdirs_0.3.3                Rcpp_1.0.10                   bumphunter_1.40.0             vctrs_0.6.1                  
[46] Biostrings_2.66.0             rhdf5filters_1.10.0           nlme_3.1-160                  rtracklayer_1.58.0            iterators_1.0.14             
[51] DelayedMatrixStats_1.20.0     stringr_1.5.0                 mime_0.12                     lifecycle_1.0.3               restfulr_0.0.15              
[56] rngtools_1.5.2                gtools_3.9.4                  XML_3.99-0.13                 AnnotationHub_3.6.0           zlibbioc_1.44.0              
[61] scales_1.2.1                  BSgenome_1.66.2               hms_1.1.2                     promises_1.2.0.1              rhdf5_2.42.0                 
[66] RColorBrewer_1.1-3            yaml_2.3.6                    curl_4.3.3                    memoise_2.0.1                 ggplot2_3.4.2                
[71] biomaRt_2.54.0                stringi_1.7.12                RSQLite_2.2.19                BiocVersion_3.16.0            BiocIO_1.8.0                 
[76] foreach_1.5.2                 permute_0.9-7                 GenomicFeatures_1.50.2        filelock_1.0.2                rlang_1.1.0                  
[81] pkgconfig_2.0.3               bitops_1.0-7                  lattice_0.20-45               Rhdf5lib_1.20.0               GenomicAlignments_1.34.0     
[86] bit_4.0.5                     tidyselect_1.2.0              plyr_1.8.8                    magrittr_2.0.3                R6_2.5.1                     
[91] generics_0.1.3                DelayedArray_0.24.0           DBI_1.1.3                     pillar_1.9.0                  KEGGREST_1.38.0              
[96] RCurl_1.98-1.9                tibble_3.2.1                  crayon_1.5.2                  utf8_1.2.3                    BiocFileCache_2.6.0          
[101] tzdb_0.3.0                    progress_1.2.2                locfit_1.5-9.6                grid_4.2.1                    data.table_1.14.8            
[106] blob_1.2.3                    digest_0.6.31                 xtable_1.8-4                  httpuv_1.6.6                  regioneR_1.30.0              
[111] outliers_0.15                 R.utils_2.12.2                munsell_0.5.0                
kdkorthauer commented 7 months ago

Hi! Thanks for reaching out. I'm afraid I can't tell from this information why the error is being thrown. I'm happy to help try and diagnose if you are able to send me a small subset of your chr1_22_M_bismarkBSseq.filtered object that still throws the error. Let me know if you'd like a dropbox link for upload.

KristenKrolick commented 7 months ago

Yes, for the dropbox link. That sounds awesome, thanks! I know how to easily subset my data by 1 chromosome, and then save the workspace. Did you want me to upload the .Rdata file (workspace) or a different way?

kdkorthauer commented 7 months ago

You can subset to 1 chromosome (or even just a small fraction e.g. 10% of the chromosome if that still throws the same error - you can use bsObjectSubset <- bsObject[1:X,] where bsObject is the original bsseq object, and X is the number of sites you'd like to keep in the subset). After verifying the error occurs on the bsObjectSubset, please use the saveRDS function to save the subsetted object (e.g. saveRDS(bsObjectSubset, "exData.rds").

Here is a dropbox file request link for easy upload of the .rds file: https://www.dropbox.com/request/PK9TVgZqRPLX6D1K1xr3

KristenKrolick commented 7 months ago

Oh nevermind it is working! It was because I had '0' and '1' for my testvariable instead of 'Yes' or 'No' category. Thank you!