hansenlab / bsseq

Devel repository for bsseq
35 stars 25 forks source link

Mistake in `BSseq` constructor? #80

Open rcavalcante opened 5 years ago

rcavalcante commented 5 years ago

Hello,

I'm trying to get the code for methylSig in working order with the Bioc 3.8 and devel versions of bsseq and I'm running into a problem I didn't have with previous versions.

Namely,

library(bsseq)

gr =GRanges(
    seqnames = c('chr21','chr21','chr21'),
    ranges = IRanges(start = c(1,3,5), end = c(1,3,5)),
    strand = '*'
)

m = matrix(
    data = c(34,10,0,300,87,434), 
    nrow = 3, ncol = 2,
    dimnames = list(NULL, c('test_1','test_2')))

cov = matrix(
    data = c(34,10,0,300,96,458), 
    nrow = 3, ncol = 2,
    dimnames = list(NULL, c('test_1','test_2')))

pData = data.frame(
    Sample_Name = c('test_1','test_2'),
    Group = factor(1,0),
    row.names = c('test_1','test_2')
)

bs = bsseq::BSseq(gr = gr, M = m, Cov = cov, pData = pData)

Gives the following error despite the rownames of pData being correct relative to the M and Cov parameters:

Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_assays(assays)) : 
  invalid rownames length

I've tracked down the problem to this block of the constructor (lines 91-111 of BSseq-class.R):

    # Process 'sampleNames' and 'pData'.
    if (is.null(sampleNames)) {
        if (is.null(pData)) {
            # BSseq object will have no colnames.
            pData <- S4Vectors:::new_DataFrame(nrows = ncol(M))
        } else {
            # BSseq object will have 'sampleNames' as colnames.
            pData <- DataFrame(row.names = sampleNames)
        }
    } else {
        if (is.null(pData)) {
            # BSseq object will have 'sampleNames' as colnames.
            pData <- DataFrame(row.names = sampleNames)
        } else {
            if (is.null(rownames(pData))) {
                rownames(pData) <- sampleNames
            } else {
                stopifnot(identical(rownames(pData), sampleNames))
            }
        }
    }

The small example I've given above has NULL sampleNames but non-NULL pData, so it tries to run pData <- DataFrame(row.names = sampleNames), which obliterates the pData I've passed and causes the invalid rownames error.

I think, at least. Please let me know if I've misunderstood or you can't reproduce with the example above. Thanks in advance!

Here is my sessionInfo(), I'm running in the Bioc 3.8 release Docker image with updated packages:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [3] BiocParallel_1.16.5         matrixStats_0.54.0         
 [5] Biobase_2.42.0              GenomicRanges_1.34.0       
 [7] GenomeInfoDb_1.18.1         IRanges_2.16.0             
 [9] S4Vectors_0.20.1            BiocGenerics_0.28.0        
[11] methylSig_0.6.0             testthat_2.0.1             
[13] BiocManager_1.30.4         

loaded via a namespace (and not attached):
 [1] DSS_2.30.1                    bitops_1.0-6                 
 [3] fs_1.2.6                      usethis_1.4.0                
 [5] devtools_2.0.1                bit64_0.9-7                  
 [7] bsseq_1.18.0                  progress_1.2.0               
 [9] httr_1.4.0                    rprojroot_1.3-2              
[11] tools_3.5.1                   backports_1.1.3              
[13] R6_2.3.0                      HDF5Array_1.10.1             
[15] DBI_1.0.0                     lazyeval_0.2.1               
[17] colorspace_1.4-0              permute_0.9-4                
[19] withr_2.1.2                   tidyselect_0.2.5             
[21] prettyunits_1.0.2             processx_3.2.1               
[23] bit_1.1-14                    compiler_3.5.1               
[25] cli_1.0.1                     desc_1.2.0                   
[27] rtracklayer_1.42.1            scales_1.0.0                 
[29] readr_1.3.1                   callr_3.1.1                  
[31] stringr_1.3.1                 digest_0.6.18                
[33] Rsamtools_1.34.1              R.utils_2.7.0                
[35] XVector_0.22.0                pkgconfig_2.0.2              
[37] htmltools_0.3.6               sessioninfo_1.1.1            
[39] limma_3.38.3                  BSgenome_1.50.0              
[41] regioneR_1.14.0               rlang_0.3.1                  
[43] rstudioapi_0.9.0              RSQLite_2.1.1                
[45] DelayedMatrixStats_1.4.0      shiny_1.2.0                  
[47] bindr_0.1.1                   gtools_3.8.1                 
[49] R.oo_1.22.0                   dplyr_0.7.8                  
[51] RCurl_1.95-4.11               magrittr_1.5                 
[53] GenomeInfoDbData_1.2.0        Matrix_1.2-15                
[55] Rhdf5lib_1.4.2                Rcpp_1.0.0                   
[57] munsell_0.5.0                 R.methodsS3_1.7.1            
[59] stringi_1.2.4                 yaml_2.2.0                   
[61] zlibbioc_1.28.0               rhdf5_2.26.2                 
[63] pkgbuild_1.0.2                plyr_1.8.4                   
[65] AnnotationHub_2.14.3          grid_3.5.1                   
[67] blob_1.1.1                    promises_1.0.1               
[69] crayon_1.3.4                  lattice_0.20-38              
[71] Biostrings_2.50.2             GenomicFeatures_1.34.3       
[73] hms_0.4.2                     locfit_1.5-9.1               
[75] ps_1.3.0                      pillar_1.3.1                 
[77] boot_1.3-20                   reshape2_1.4.3               
[79] biomaRt_2.38.0                pkgload_1.0.2                
[81] XML_3.98-1.16                 glue_1.3.0                   
[83] annotatr_1.8.0                data.table_1.12.0            
[85] remotes_2.0.2                 httpuv_1.4.5.1               
[87] gtable_0.2.0                  purrr_0.3.0                  
[89] assertthat_0.2.0              ggplot2_3.1.0                
[91] mime_0.6                      xtable_1.8-3                 
[93] later_0.7.5                   tibble_2.0.1                 
[95] GenomicAlignments_1.18.1      AnnotationDbi_1.44.0         
[97] memoise_1.1.0                 bindrcpp_0.2.2               
[99] interactiveDisplayBase_1.20.0
PeteHaitch commented 5 years ago

Thanks for the example. I'll take a look this week and get back to you.

lcolladotor commented 5 years ago

Hi @PeteHaitch,

I ran into a very similar issue. I believe that the culprit is this: https://github.com/hansenlab/bsseq/blame/master/R/BSseq-class.R#L98 since to get to it, the sampleNames have to be NULL and pData is not NULL. So it looks to me like https://github.com/hansenlab/bsseq/blob/3281f31f0f6cfefb5c75c3a5f7c59f13d685255e/R/BSseq-class.R#L96-L99 needs to be just a single } (remove the else clause).

Best, Leo

lcolladotor commented 5 years ago

Ok, I just tested and it works for me using data from https://github.com/LieberInstitute/brain-epigenomics/blob/master/DMR_acf/compute_DMR_acf.R (under the following test case https://github.com/LieberInstitute/brain-epigenomics/blob/master/DMR_acf/compute_DMR_acf.R#L29).

> packageVersion('bsseq')
[1] ‘1.18.0’
lcolladotor commented 5 years ago

This PR https://github.com/hansenlab/bsseq/issues/81 resolves this

lcolladotor commented 5 years ago

If possible, can you port it also to the RELEASE branch? Thx!

lcolladotor commented 5 years ago

Hi Pete,

I git cherry-picked my commit to my fork of the RELEASE_3_8 branch and was able to install it with devtools::install_github('lcolladotor/bsseq@RELEASE_3_8'), so there's no hurry from my side about this. My old code now works again ^^

Best, Leo

PeteHaitch commented 5 years ago

Not forgotten this, just been swamped preparing for conference next week. Thanks for your patience.

rcavalcante commented 5 years ago

I completely understand. No problem.

rcavalcante commented 5 years ago

It appears that one easy fix is to always include both pData and sampleNames.