kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Error when the test covariate has more than two levels #22

Closed RoseString closed 5 years ago

RoseString commented 5 years ago

Dear @kdkorthauer ,

I ran into a strange problem when running dmrseq with 3 levels for the test covariate, which didn't occur when I removed samples from one of the 3 levels (that is, it runs OK with 2 levels).

Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match

Below is a small subset of my data that can reproduce this error: https://www.dropbox.com/s/99ztnarnujx4rfn/dat.rds?dl=0

The code to reproduce the error is:

dat <- readRDS("dat.rds")

dmrseq(dat,
 cutoff = 0.025, 
 testCovariate = "group",
 adjustCovariate = c("age","sex"),
 maxPerms = 10)

Strangely, the error didn't show up when the cutoff value was set to 0.05 or 0.1.

The design for my 27 samples is:

Screen Shot 2019-03-15 at 10 42 13 PM

Thank you in advance!

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Client release 6.7 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] dmrseq_1.3.8                bsseq_1.18.0
 [3] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
 [5] BiocParallel_1.17.17        matrixStats_0.54.0
 [7] Biobase_2.42.0              GenomicRanges_1.34.0
 [9] GenomeInfoDb_1.18.2         IRanges_2.16.0
[11] S4Vectors_0.20.1            BiocGenerics_0.28.0
[13] RevoUtils_11.0.1            RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-137                  bitops_1.0-6
 [3] bit64_0.9-7                   RColorBrewer_1.1-2
 [5] progress_1.2.0                httr_1.4.0
 [7] doRNG_1.7.1                   tools_3.5.1
 [9] R6_2.4.0                      HDF5Array_1.10.1
[11] DBI_1.0.0                     lazyeval_0.2.1.9000
[13] colorspace_1.4-0              permute_0.9-5
[15] withr_2.1.2                   tidyselect_0.2.5
[17] prettyunits_1.0.2             bit_1.1-14
[19] compiler_3.5.1                pkgmaker_0.27
[21] rtracklayer_1.42.2            scales_1.0.0
[23] readr_1.3.1                   stringr_1.4.0
[25] digest_0.6.18                 Rsamtools_1.34.1
[27] R.utils_2.8.0                 XVector_0.22.0
[29] pkgconfig_2.0.2               htmltools_0.3.6
[31] bibtex_0.4.2                  limma_3.38.3
[33] BSgenome_1.50.0               regioneR_1.14.0
[35] rlang_0.3.1                   RSQLite_2.1.1
[37] shiny_1.2.0                   DelayedMatrixStats_1.4.0
[39] gtools_3.8.1                  dplyr_0.8.0.9008
[41] R.oo_1.22.0                   RCurl_1.95-4.12
[43] magrittr_1.5                  GenomeInfoDbData_1.2.0
[45] Matrix_1.2-16                 Rcpp_1.0.0
[47] munsell_0.5.0                 Rhdf5lib_1.4.2
[49] R.methodsS3_1.7.1             stringi_1.4.3
[51] yaml_2.2.0                    zlibbioc_1.28.0
[53] bumphunter_1.24.5             rhdf5_2.26.2
[55] plyr_1.8.4                    AnnotationHub_2.14.4
[57] grid_3.5.1                    blob_1.1.1
[59] promises_1.0.1                crayon_1.3.4
[61] lattice_0.20-38               splines_3.5.1
[63] Biostrings_2.50.2             GenomicFeatures_1.34.4
[65] hms_0.4.2                     locfit_1.5-9.1
[67] pillar_1.3.1                  rngtools_1.3.1
[69] codetools_0.2-16              reshape2_1.4.3
[71] biomaRt_2.38.0                XML_3.98-1.19
[73] glue_1.3.1                    outliers_0.14
[75] annotatr_1.8.0                data.table_1.12.0
[77] BiocManager_1.30.4            foreach_1.5.0
[79] httpuv_1.4.5.1                gtable_0.2.0
[81] purrr_0.3.1                   assertthat_0.2.0
[83] ggplot2_3.1.0                 mime_0.6
[85] xtable_1.8-3                  later_0.8.0
[87] tibble_2.0.1                  iterators_1.0.10
[89] registry_0.5-1                GenomicAlignments_1.18.1
[91] AnnotationDbi_1.44.0          memoise_1.1.0
[93] interactiveDisplayBase_1.20.0
kdkorthauer commented 5 years ago

Hi @RoseString,

Thanks for reporting this bug, and thank you for the reproducible example!

I was able to replicate the result you were seeing and track down the bug. I've just pushed a fix - it will be available in Bioconductor devel in the next day or two (or through GitHub immediately).

Please try it out and let me know if you run into any more issues.

Best, Keegan

RoseString commented 5 years ago

Thanks for fixing the bug, @kdkorthauer!

I tested dmrseq 1.3.9 on my entire dataset, and it finished without errors. I'll close the issue now.

RoseString commented 5 years ago

Dear @kdkorthauer ,

Sorry to bother you again! The run for my previous dataset is completed successfully using version 1.3.9.

However, I ran into a separate issue using a different dataset.

Error in match.names(clabs, names(xi)) : names do not match previous names

I found it's very few CpGs that are causing this error. You can access the test data in: https://www.dropbox.com/s/yf86o3dp4zud9lf/test.rds?dl=0

Below is design table. There are only two levels for the test covariate this time.

Screen Shot 2019-03-22 at 2 40 37 PM

To get the same error, you can run the following code:

test <- readRDS("test.rds")

dmrseq(test,
               cutoff = 0.05,
               testCovariate = "gp",
               adjustCovariate = c("age","sex"),
               minNumRegion = 3)

Thank you for your time. I really appreciate it!

> traceback()
19: stop("names do not match previous names")
18: match.names(clabs, names(xi))
17: rbind(deparse.level, ...)
16: rbind(...)
15: eval(mc, env)
14: eval(mc, env)
13: eval(mc, env)
12: standardGeneric("rbind")
11: rbind(list(stat = NA, constant = TRUE, beta_gp2 = NA), list(stat = -1.81490725506887,
        constant = FALSE, beta = -0.168376465145481))
10: do.call("rbind", lapply(Indexes, function(Index) asin.gls.cov(ix = ind[Index],
        design = design, coeff = coeff)))
9: do.call("rbind", lapply(Indexes, function(Index) asin.gls.cov(ix = ind[Index],
       design = design, coeff = coeff)))
8: regionScanner(meth.mat = meth.mat, cov.mat = cov.mat, pos = pos,
       chr = chr, x = beta, y = rawBeta, maxGap = maxGap, cutoff = cutoff,
       minNumRegion = minNumRegion, design = design, coeff = coeff,
       coeff.adj = coeff.adj, verbose = verbose, parallel = parallel,
       pDat = pData(bs), block = block, blockSize = blockSize, fact = fact,
       adjustCovariate = adjustCovariate)
7: eval(quote(list(...)), env)
6: eval(quote(list(...)), env)
5: eval(quote(list(...)), env)
4: standardGeneric("rbind")
3: rbind(tab, regionScanner(meth.mat = meth.mat, cov.mat = cov.mat,
       pos = pos, chr = chr, x = beta, y = rawBeta, maxGap = maxGap,
       cutoff = cutoff, minNumRegion = minNumRegion, design = design,
       coeff = coeff, coeff.adj = coeff.adj, verbose = verbose,
       parallel = parallel, pDat = pData(bs), block = block, blockSize = blockSize,
       fact = fact, adjustCovariate = adjustCovariate))
2: bumphunt(bs = bs, design = design, coeff = coeff, coeff.adj = coeff.adj,
       minInSpan = minInSpan, minNumRegion = minNumRegion, cutoff = cutoff,
       maxGap = maxGap, maxGapSmooth = maxGapSmooth, smooth = smooth,
       bpSpan = bpSpan, verbose = verbose, parallel = parallel,
       block = block, blockSize = blockSize, chrsPerChunk = chrsPerChunk,
       fact = fact, adjustCovariate = adjustCovariate)
1: dmrseq(test, cutoff = cutoffVal, testCovariate = "gp", adjustCovariate = c("age",
       "sex"), maxPerms = permVal, minNumRegion = 3, chrsPerChunk = 1)
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Client release 6.7 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] dmrseq_1.3.9                bsseq_1.18.0
 [3] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
 [5] BiocParallel_1.17.17        matrixStats_0.54.0
 [7] Biobase_2.42.0              GenomicRanges_1.34.0
 [9] GenomeInfoDb_1.18.2         IRanges_2.16.0
[11] S4Vectors_0.20.1            BiocGenerics_0.28.0
[13] RevoUtils_11.0.1            RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-137                  bitops_1.0-6
 [3] bit64_0.9-7                   RColorBrewer_1.1-2
 [5] progress_1.2.0                httr_1.4.0
 [7] doRNG_1.7.1                   tools_3.5.1
 [9] R6_2.4.0                      HDF5Array_1.10.1
[11] DBI_1.0.0                     lazyeval_0.2.1.9000
[13] colorspace_1.4-0              permute_0.9-5
[15] withr_2.1.2                   tidyselect_0.2.5
[17] prettyunits_1.0.2             bit_1.1-14
[19] compiler_3.5.1                pkgmaker_0.27
[21] rtracklayer_1.42.2            scales_1.0.0
[23] readr_1.3.1                   stringr_1.4.0
[25] digest_0.6.18                 Rsamtools_1.34.1
[27] R.utils_2.8.0                 XVector_0.22.0
[29] pkgconfig_2.0.2               htmltools_0.3.6
[31] bibtex_0.4.2                  limma_3.38.3
[33] BSgenome_1.50.0               regioneR_1.14.0
[35] rlang_0.3.1                   RSQLite_2.1.1
[37] shiny_1.2.0                   DelayedMatrixStats_1.4.0
[39] gtools_3.8.1                  dplyr_0.8.0.9008
[41] R.oo_1.22.0                   RCurl_1.95-4.12
[43] magrittr_1.5                  GenomeInfoDbData_1.2.0
[45] Matrix_1.2-16                 Rcpp_1.0.0
[47] munsell_0.5.0                 Rhdf5lib_1.4.2
[49] R.methodsS3_1.7.1             stringi_1.4.3
[51] yaml_2.2.0                    zlibbioc_1.28.0
[53] bumphunter_1.24.5             rhdf5_2.26.2
[55] plyr_1.8.4                    AnnotationHub_2.14.4
[57] grid_3.5.1                    blob_1.1.1
[59] promises_1.0.1                crayon_1.3.4
[61] lattice_0.20-38               splines_3.5.1
[63] Biostrings_2.50.2             GenomicFeatures_1.34.4
[65] hms_0.4.2                     locfit_1.5-9.1
[67] pillar_1.3.1                  rngtools_1.3.1
[69] codetools_0.2-16              reshape2_1.4.3
[71] biomaRt_2.38.0                XML_3.98-1.19
[73] glue_1.3.1                    outliers_0.14
[75] annotatr_1.8.0                data.table_1.12.0
[77] BiocManager_1.30.4            foreach_1.5.0
[79] httpuv_1.4.5.1                gtable_0.2.0
[81] purrr_0.3.1                   assertthat_0.2.0
[83] ggplot2_3.1.0                 mime_0.6
[85] xtable_1.8-3                  later_0.8.0
[87] tibble_2.0.1                  iterators_1.0.10
[89] registry_0.5-1                GenomicAlignments_1.18.1
[91] AnnotationDbi_1.44.0          memoise_1.1.0
[93] interactiveDisplayBase_1.20.0
kdkorthauer commented 5 years ago

Hi @RoseString,

Thanks for reporting this new bug, and thank you for the reproducible example!

I was able to replicate the result, and determined that it was something I introduced in the previous bug fix - I didn't properly match column names for the results output data frame. I've just pushed a fix - it will be available in Bioconductor devel in the next day or two (or through GitHub immediately).

Please try it out and let me know if you run into any more issues. These reports are very helpful for improving the usability of the software.

Best, Keegan

RoseString commented 5 years ago

I tested it on my large dataset. The problem is solved in the new version.

Thank you @kdkorthauer !