zwdzwd / sesame

🍪 SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Other
58 stars 31 forks source link

Error with summaryExtractTest() when using paired samples #68

Open greyajosh opened 2 years ago

greyajosh commented 2 years ago

Hi Wanding,

Thank you for the extensive and useful package! I have been trying to extract a group of probes specific to a contract variable using the DML() function followed by summaryExtractTest(). I am using paired patient samples and using blocking in my experimental design to account for the patient to patient variation and using race as the contrast variable for which I want to extract probes. I have 41 samples from 32 patients of 2 races. It works with the example data and with another set I have that does not require blocking. This is the error I encounter:

`data <- SummarizedExperiment(assays = as.matrix(na.omit(betasFM[grep('cg', rownames(betasFM))[1:1000], ])), metadata = fcc)

data@colData@listData <- as.list(fcc)

data@colData@listData[["Race"]] <- as.factor(data@colData@listData[["Race"]])

data@colData@listData[["patient"]] <- as.factor(data@colData@listData[["patient"]])

colData(data)$Race <- relevel(factor(colData(data)$Race), "White")

colData(data)$patient <- relevel(factor(colData(data)$patient), "B.MP321")

smryF = DML(data, ~Race + patient)

test_result = summaryExtractTest(smryF)

Error in [.data.frame(est, , paste0("Est_", cont, lvs), drop = FALSE) : undefined columns selected`

I also tried to pass it through the code I found in sesame/dm.R which works up until this point where I obtain the following error:

`est <- as.data.frame(t(do.call(cbind, lapply(smry, function(x) { x$coefficients[,'Estimate']; })))) rownames(est) <- names(smry) colnames(est) <- paste0("Est_", colnames(est)) est$ProbeID <- names(smry) pvals <- as.data.frame(t(do.call(cbind, lapply(smry, function(x) { x$coefficients[,"Pr(>|t|)"] })))) rownames(pvals) <- names(smry) colnames(pvals) <- paste0("Pval", colnames(pvals)) f_pvals <- do.call(rbind, lapply(smry, function(x) { x$Ftest["pval",,drop=FALSE] })) rownames(f_pvals) <- names(smry) colnames(fpvals) <- paste0("FPval", colnames(fpvals)) contr2lvs <- attr(smry, "contr2lvs") effsize <- do.call(cbind, lapply(names(contr2lvs), function(cont) { lvs <- contr2lvs[[cont]] lvs <- lvs[2:length(lvs)] apply(est[, paste0("Est", cont, lvs),drop=FALSE], 1, function(x) { max(x,0) - min(x,0) }) }))

Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': undefined columns selected`

Is this something to do with the patient blocking or am I naively missing something? It may also be worth noting that I am able to use that same DMLSummary object for obtaining DMRs through DMR() successfully.

Thanks again, Josh

Below is my session info.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] lmtest_0.9-40 zoo_1.8-10 ggfortify_0.4.14 randomForest_4.7-1
[5] wheatmap_0.2.0 Gviz_1.40.1 biomaRt_2.52.0 RColorBrewer_1.1-3
[9] limma_3.52.0 knitr_1.39 forcats_0.5.1 stringr_1.4.0
[13] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[17] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1 rlist_0.4.6.2
[21] minfi_1.42.0 bumphunter_1.38.0 locfit_1.5-9.5 iterators_1.0.14
[25] foreach_1.5.2 Biostrings_2.64.0 XVector_0.36.0 SummarizedExperiment_1.26.1 [29] Biobase_2.56.0 MatrixGenerics_1.8.0 matrixStats_0.62.0 GenomicRanges_1.48.0
[33] GenomeInfoDb_1.32.1 IRanges_2.30.0 S4Vectors_0.34.0 sesame_1.14.0
[37] sesameData_1.14.0 ExperimentHub_2.4.0 AnnotationHub_3.4.0 BiocFileCache_2.4.0
[41] dbplyr_2.1.1 BiocGenerics_0.42.0 illuminaio_0.38.0 xfun_0.30

loaded via a namespace (and not attached): [1] utf8_1.2.2 tidyselect_1.1.2 htmlwidgets_1.5.4 RSQLite_2.2.14
[5] AnnotationDbi_1.58.0 BiocParallel_1.30.0 munsell_0.5.0 codetools_0.2-18
[9] preprocessCore_1.58.0 withr_2.5.0 colorspace_2.0-3 filelock_1.0.2
[13] highr_0.9 rstudioapi_0.13 labeling_0.4.2 GenomeInfoDbData_1.2.8
[17] farver_2.1.0 bit64_4.0.5 rhdf5_2.40.0 vctrs_0.4.1
[21] generics_0.1.2 biovizBase_1.44.0 R6_2.5.1 AnnotationFilter_1.20.0
[25] bitops_1.0-7 rhdf5filters_1.8.0 cachem_1.0.6 reshape_0.8.9
[29] DelayedArray_0.22.0 assertthat_0.2.1 promises_1.2.0.1 BiocIO_1.6.0
[33] scales_1.2.0 nnet_7.3-17 gtable_0.3.0 ensembldb_2.20.1
[37] rlang_1.0.2 genefilter_1.78.0 splines_4.2.0 lazyeval_0.2.2
[41] rtracklayer_1.56.0 GEOquery_2.64.0 dichromat_2.0-0.1 checkmate_2.1.0
[45] broom_0.8.0 BiocManager_1.30.17 yaml_2.3.5 reshape2_1.4.4
[49] modelr_0.1.8 GenomicFeatures_1.48.0 backports_1.4.1 httpuv_1.6.5
[53] Hmisc_4.7-0 tools_4.2.0 nor1mix_1.3-0 ellipsis_0.3.2
[57] siggenes_1.70.0 Rcpp_1.0.8.3 plyr_1.8.7 base64enc_0.1-3
[61] sparseMatrixStats_1.8.0 progress_1.2.2 zlibbioc_1.42.0 RCurl_1.98-1.6
[65] prettyunits_1.1.1 rpart_4.1.16 openssl_2.0.0 cluster_2.1.3
[69] haven_2.5.0 fs_1.5.2 magrittr_2.0.3 data.table_1.14.2
[73] reprex_2.0.1 ProtGenerics_1.28.0 hms_1.1.1 mime_0.12
[77] evaluate_0.15 xtable_1.8-4 XML_3.99-0.9 jpeg_0.1-9
[81] mclust_5.4.9 readxl_1.4.0 gridExtra_2.3 compiler_4.2.0
[85] crayon_1.5.1 htmltools_0.5.2 later_1.3.0 tzdb_0.3.0
[89] Formula_1.2-4 lubridate_1.8.0 DBI_1.1.2 MASS_7.3-57
[93] rappdirs_0.3.3 Matrix_1.4-1 cli_3.3.0 quadprog_1.5-8
[97] pkgconfig_2.0.3 GenomicAlignments_1.32.0 foreign_0.8-82 xml2_1.3.3
[101] annotate_1.74.0 rngtools_1.5.2 multtest_2.52.0 beanplot_1.3.1
[105] rvest_1.0.2 doRNG_1.8.2 scrime_1.3.5 VariantAnnotation_1.42.0
[109] digest_0.6.29 rmarkdown_2.14 base64_2.0 cellranger_1.1.0
[113] htmlTable_2.4.0 DelayedMatrixStats_1.18.0 restfulr_0.0.13 curl_4.3.2
[117] shiny_1.7.1 Rsamtools_2.12.0 rjson_0.2.21 lifecycle_1.0.1
[121] nlme_3.1-157 jsonlite_1.8.0 Rhdf5lib_1.18.0 askpass_1.1
[125] BSgenome_1.64.0 fansi_1.0.3 pillar_1.7.0 lattice_0.20-45
[129] KEGGREST_1.36.0 fastmap_1.1.0 httr_1.4.3 survival_3.3-1
[133] interactiveDisplayBase_1.34.0 glue_1.6.2 png_0.1-7 BiocVersion_3.15.2
[137] bit_4.0.4 stringi_1.7.6 HDF5Array_1.24.0 blob_1.2.3
[141] latticeExtra_0.6-29 memoise_2.0.1

zwdzwd commented 2 years ago

My guess is our code is not handling the missing levels for some CGs here. Do you mind sharing me a copy of your smry object? You can send to zhouw3@chop.edu if possible. That will help me debug. Thanks!

zwdzwd commented 2 years ago

I took a look I think this is due to the reference level in "patient" being nonexistent.

colData(data)$patient <- relevel(factor(colData(data)$patient), "B.MP321")

I included a line in the updated code to circumvent this error to pop up. but I think setting the right reference should solve the problem.

gbc529 commented 2 years ago

Hello, Thank you both for sharing the original error and solution! Unfortunately, I am running into the same error but when I look at the smry object generated by my data it appears that the reference level is set for the factors I am considering, specifically "Sample.Region" and "CUB.ID". I have 224 samples that I am running and I am looking to use Sample.Region as the contrast variable. The error I am getting, my input, as well as sessionInfo is below:

#Run preprocessing using opensesame. Output of Beta-values
taka_betas <- openSesame(".")

#Convert metadata file with desired columns + beta-values to sigDF
pdata <- read.csv("/path/to/project/meta/data/meta.csv")
pdata <- column_to_rownames(pdata_noBMI, "IDAT")
pdata_noBMI<- pdata[,1:5]
betas_t <- t(taka_betas)
se <- SummarizedExperiment(t(betas_t), colData = pdata_noBMI)
se

## Modeling Differential Methylation____:
library(tidyverse)
packageVersion('sesame')

meta=dplyr::select(as_tibble(colData(se)), CUB.ID, Age, Sample.Region, Case.Control, Race)

meta

#Set all variables to factors for analysis
meta$CUB.ID= relevel(factor(meta$CUB.ID), ref='CUB-006')
meta$Age= relevel(factor(meta$Age), ref='53')
meta$Sample.Region= relevel(factor(meta$Sample.Region), ref='T') 
meta$Race= relevel(factor(meta$Race), ref='Black or African American') 
meta$Case.Control = relevel(factor(meta$Case.Control), ref = 'case')
str(meta)
betas <- assay(se)

#Selected CUB.ID in order to compare based on where the sample was taken and if it comes from the same patient
ok1 = checkLevels(betas,meta$CUB.ID)
ok2 = checkLevels(betas,meta$Sample.Region)
ok4 = checkLevels(betas,meta$Age)
ok5 = checkLevels(betas,meta$Race)
ok6 = checkLevels(betas,meta$Case.Control)

#Check reads w/t NA for CUB.ID
sum(ok1)
sum(ok2)
sum(ok4)
sum(ok5)
sum(ok6)

#Filter out for only viable probes (i.e one's with no NA present)
betas_filtered <- betas[ok1&ok2&ok4&ok5&ok6, ]

#Count number of remaining probes
dim(betas_filtered)

#Summarize data (START HERE)
smry = DML(betas_filtered, ~Sample.Region+CUB.ID, meta=meta) #Time consuming
smry
smry[[1]] #inspects summary results
res <- summaryExtractTest(smry) #creates df of DML data above

Error in `[.data.frame`(est, , paste0("Est_", cont, lvs), drop = FALSE) : 
  undefined columns selected

Below is my sessionInfo:

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS:   /hpc/software/R/4.1.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/software/R/4.1.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.14.2           forcats_0.5.1              
 [3] stringr_1.4.0               dplyr_1.0.9                
 [5] purrr_0.3.4                 readr_2.1.2                
 [7] tidyr_1.2.0                 tibble_3.1.6               
 [9] ggplot2_3.3.6               tidyverse_1.3.1            
[11] sesame_1.12.9               sesameData_1.12.0          
[13] rmarkdown_2.14              ExperimentHub_2.2.1        
[15] AnnotationHub_3.2.2         BiocFileCache_2.2.1        
[17] dbplyr_2.2.1                SummarizedExperiment_1.24.0
[19] Biobase_2.54.0              GenomicRanges_1.46.1       
[21] GenomeInfoDb_1.30.1         IRanges_2.28.0             
[23] S4Vectors_0.32.4            BiocGenerics_0.40.0        
[25] MatrixGenerics_1.6.0        matrixStats_0.62.0         

loaded via a namespace (and not attached):
  [1] wheatmap_0.2.0                fgsea_1.20.0                 
  [3] colorspace_2.0-2              ellipsis_0.3.2               
  [5] class_7.3-19                  DNAcopy_1.68.0               
  [7] XVector_0.34.0                fs_1.5.2                     
  [9] base64_2.0                    rstudioapi_0.13              
 [11] proxy_0.4-27                  ggrepel_0.9.1                
 [13] bit64_4.0.5                   interactiveDisplayBase_1.32.0
 [15] AnnotationDbi_1.56.2          fansi_1.0.2                  
 [17] lubridate_1.8.0               xml2_1.3.3                   
 [19] cachem_1.0.6                  knitr_1.39                   
 [21] jsonlite_1.8.0                broom_1.0.0                  
 [23] png_0.1-7                     shiny_1.7.1                  
 [25] BiocManager_1.30.18           compiler_4.1.1               
 [27] httr_1.4.3                    backports_1.4.1              
 [29] assertthat_0.2.1              Matrix_1.3-4                 
 [31] fastmap_1.1.0                 cli_3.3.0                    
 [33] later_1.3.0                   htmltools_0.5.2              
 [35] tools_4.1.1                   gtable_0.3.0                 
 [37] glue_1.6.2                    GenomeInfoDbData_1.2.7       
 [39] reshape2_1.4.4                rappdirs_0.3.3               
 [41] fastmatch_1.1-3               Rcpp_1.0.8.3                 
 [43] cellranger_1.1.0              vctrs_0.4.1                  
 [45] Biostrings_2.62.0             preprocessCore_1.56.0        
 [47] xfun_0.31                     rvest_1.0.2                  
 [49] mime_0.12                     lifecycle_1.0.1              
 [51] zlibbioc_1.40.0               scales_1.1.1                 
 [53] hms_1.1.1                     promises_1.2.0.1             
 [55] parallel_4.1.1                RColorBrewer_1.1-2           
 [57] yaml_2.3.5                    curl_4.3.2                   
 [59] memoise_2.0.1                 gridExtra_2.3                
 [61] stringi_1.7.6                 RSQLite_2.2.14               
 [63] BiocVersion_3.14.0            randomForest_4.7-1.1         
 [65] e1071_1.7-11                  filelock_1.0.2               
 [67] BiocParallel_1.28.3           rlang_1.0.3                  
 [69] pkgconfig_2.0.3               bitops_1.0-7                 
 [71] evaluate_0.15                 lattice_0.20-44              
 [73] bit_4.0.4                     tidyselect_1.1.2             
 [75] plyr_1.8.7                    magrittr_2.0.3               
 [77] R6_2.5.1                      generics_0.1.3               
 [79] DelayedArray_0.20.0           DBI_1.1.3                    
 [81] withr_2.5.0                   pillar_1.7.0                 
 [83] haven_2.5.0                   KEGGREST_1.34.0              
 [85] RCurl_1.98-1.7                modelr_0.1.8                 
 [87] crayon_1.4.2                  KernSmooth_2.23-20           
 [89] utf8_1.2.2                    tzdb_0.3.0                   
 [91] readxl_1.4.0                  grid_4.1.1                   
 [93] blob_1.2.3                    reprex_2.0.1                 
 [95] digest_0.6.29                 xtable_1.8-4                 
 [97] httpuv_1.6.5                  illuminaio_0.36.0            
 [99] openssl_2.0.2                 munsell_0.5.0                
[101] askpass_1.1   

Thank you for your help!

zwdzwd commented 2 years ago

can you try the latest sesame v1.15.4 and see if this problem still exist? We have some update lately. @gbc529 thanks.

gbc529 commented 2 years ago

Hello Dr. Zhou,

Updating to the latest version solved my issue. However, now that I have moved onto the modeling, am not sure how I would like to run comparisons, and can not find an answer in the documentation. I will put my question in the discussions section above. Thank you @zwdzwd