neurogenomics / EpiCompare

Comparison, benchmarking & QC of epigenetic datasets
https://doi.org/doi:10.18129/B9.bioc.EpiCompare
13 stars 3 forks source link

`error in evaluating the argument 'x' in selecting a method for function 'elementMetadata': GRanges objects don't support [[` #59

Closed bschilder closed 2 years ago

bschilder commented 2 years ago

Can't seem to find anything wrong with my input files, but I keep getting this error.

Reprex

I've just uploaded the peak and picard list objects as .rds files to our lab folder.

#### Import sample files ####
peakfiles <- readRDS("~/Downloads/peakfiles.rds")
picardfiles <- readRDS("~/Downloads/picardfiles.rds")

#### import reference files ####
#### H3K27ac ####  
encode_ac <- rtracklayer::import("https://www.encodeproject.org/files/ENCFF044JNJ/@@download/ENCFF044JNJ.bed.gz", format = "narrowPeak")
#### H3K27me3 ####
encode_me3 <- rtracklayer::import( "https://www.encodeproject.org/files/ENCFF322IFF/@@download/ENCFF322IFF.bed.gz", 
                                  format = "narrowPeak")

ref_list <- list("encode_ac"=encode_ac,
                 "encode_me3"=encode_me3)

#### Run EpiCompare once per reference dataset ###

data("hg19_blacklist") # example blacklist 

out_list <- lapply(names(ref_list), function(nm){
  message("\n",nm)
  save_dir <- here::here("reports",nm)
  dir.create(save_dir, showWarnings = FALSE, recursive = TRUE)
  EpiCompare::EpiCompare(peakfiles = peakfiles, 
                         picard_files = picardfiles,
                         blacklist = hg19_blacklist,
                         genome_build = "hg38",
                         reference = ref_list[[nm]],
                         stat_plot = TRUE,
                         chromHMM_plot = TRUE,
                         chipseeker_plot = TRUE,
                         enrichment_plot = TRUE,
                         save_output = TRUE,
                         chromHMM_annotation = "K562",
                         output_dir = save_dir)
})

Error

Saving 10 x 13.7 in image
Quitting from lines 298-313 (EpiCompare.Rmd) 
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'elementMetadata': GRanges objects don't support [[, as.list(), lapply(), or unlist() at the
  moment

Session info

``` R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] utf8_1.2.2 tidyselect_1.1.2 [3] htmlwidgets_1.5.4 RSQLite_2.2.14 [5] AnnotationDbi_1.58.0 grid_4.2.0 [7] BiocParallel_1.30.0 scatterpie_0.1.7 [9] munsell_0.5.0 colorspace_2.0-3 [11] GOSemSim_2.22.0 Biobase_2.56.0 [13] filelock_1.0.2 knitr_1.39 [15] rstudioapi_0.13 stats4_4.2.0 [17] DOSE_3.22.0 MatrixGenerics_1.8.0 [19] GenomeInfoDbData_1.2.8 polyclip_1.10-0 [21] seqPattern_1.28.0 bit64_4.0.5 [23] farver_2.1.0 rprojroot_2.0.3 [25] downloader_0.4 vctrs_0.4.1 [27] treeio_1.20.0 generics_0.1.2 [29] xfun_0.31 BiocFileCache_2.4.0 [31] R6_2.5.1 GenomeInfoDb_1.32.1 [33] graphlayouts_0.8.0 locfit_1.5-9.5 [35] bitops_1.0-7 BRGenomics_1.8.0 [37] cachem_1.0.6 fgsea_1.22.0 [39] gridGraphics_0.5-1 DelayedArray_0.22.0 [41] assertthat_0.2.1 promises_1.2.0.1 [43] BiocIO_1.6.0 scales_1.2.0 [45] ggraph_2.0.5 enrichplot_1.16.0 [47] gtable_0.3.0 tidygraph_1.2.1 [49] rlang_1.0.2 genefilter_1.78.0 [51] splines_4.2.0 rtracklayer_1.56.0 [53] lazyeval_0.2.2 impute_1.70.0 [55] plyranges_1.16.0 BiocManager_1.30.17 [57] yaml_2.3.5 reshape2_1.4.4 [59] GenomicFeatures_1.48.0 httpuv_1.6.5 [61] qvalue_2.28.0 clusterProfiler_4.4.1 [63] tools_4.2.0 ggplotify_0.1.0 [65] gridBase_0.4-7 ggplot2_3.3.6 [67] ellipsis_0.3.2 gplots_3.1.3 [69] RColorBrewer_1.1-3 BiocGenerics_0.42.0 [71] Rcpp_1.0.8.3 plyr_1.8.7 [73] progress_1.2.2 zlibbioc_1.42.0 [75] purrr_0.3.4 RCurl_1.98-1.6 [77] prettyunits_1.1.1 viridis_0.6.2 [79] S4Vectors_0.34.0 SummarizedExperiment_1.26.1 [81] ggrepel_0.9.1 here_1.0.1 [83] magrittr_2.0.3 data.table_1.14.2 [85] DO.db_2.9 matrixStats_0.62.0 [87] evaluate_0.15 hms_1.1.1 [89] patchwork_1.1.1 mime_0.12 [91] xtable_1.8-4 XML_3.99-0.9 [93] readxl_1.4.0 IRanges_2.30.0 [95] gridExtra_2.3 compiler_4.2.0 [97] biomaRt_2.52.0 tibble_3.1.7 [99] KernSmooth_2.23-20 crayon_1.5.1 [101] shadowtext_0.1.2 htmltools_0.5.2 [103] tzdb_0.3.0 ggfun_0.0.6 [105] later_1.3.0 tidyr_1.2.0 [107] geneplotter_1.74.0 aplot_0.1.4 [109] DBI_1.1.2 tweenr_1.0.2 [111] ChIPseeker_1.32.0 genomation_1.28.0 [113] dbplyr_2.1.1 MASS_7.3-57 [115] rappdirs_0.3.3 boot_1.3-28 [117] Matrix_1.4-1 readr_2.1.2 [119] cli_3.3.0 parallel_4.2.0 [121] igraph_1.3.1 GenomicRanges_1.48.0 [123] pkgconfig_2.0.3 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [125] GenomicAlignments_1.32.0 plotly_4.10.0 [127] xml2_1.3.3 ggtree_3.4.0 [129] annotate_1.74.0 XVector_0.36.0 [131] yulab.utils_0.0.4 stringr_1.4.0 [133] digest_0.6.29 Biostrings_2.64.0 [135] cellranger_1.1.0 rmarkdown_2.14 [137] fastmatch_1.1-3 tidytree_0.3.9 [139] EpiCompare_0.99.16 restfulr_0.0.13 [141] curl_4.3.2 shiny_1.7.1 [143] Rsamtools_2.12.0 gtools_3.9.2 [145] rjson_0.2.21 lifecycle_1.0.1 [147] nlme_3.1-157 jsonlite_1.8.0 [149] viridisLite_0.4.0 BSgenome_1.64.0 [151] fansi_1.0.3 pillar_1.7.0 [153] lattice_0.20-45 KEGGREST_1.36.0 [155] fastmap_1.1.0 httr_1.4.3 [157] plotrix_3.8-2 survival_3.3-1 [159] GO.db_3.15.0 interactiveDisplayBase_1.34.0 [161] glue_1.6.2 remotes_2.4.2 [163] UpSetR_1.4.0 png_0.1-7 [165] BiocVersion_3.15.2 bit_4.0.4 [167] ggforce_0.3.3 stringi_1.7.6 [169] blob_1.2.3 DESeq2_1.36.0 [171] org.Hs.eg.db_3.15.0 AnnotationHub_3.4.0 [173] caTools_1.18.2 memoise_2.0.1 [175] dplyr_1.0.9 ape_5.6-2 ```
bschilder commented 2 years ago

Ah, actually I see what's happening. reference= used to take a single GRanges object, but it now takes a named list only.

Might be good to add a check at the beginning to see if the supplied reference is actually a GRanges object, and if so turn it into a list (with a single element) and give it a name based on the object name.

bschilder commented 2 years ago

Ok, got much further than before by supplying reference as a list, but still hit another error. Is this because reference can only handle a single element?

Modified reprex


 EpiCompare::EpiCompare(peakfiles = peakfiles, 
                         picard_files = picardfiles,
                         blacklist = hg19_blacklist,
                         genome_build = "hg38",
                         reference = ref_list,
                         stat_plot = TRUE,
                         chromHMM_plot = TRUE,
                         chipseeker_plot = TRUE,
                         enrichment_plot = TRUE,
                         save_output = TRUE,
                         chromHMM_annotation = "K562",
                         output_dir = save_dir <- here::here("reports","encode_ac.encode_me3"))

Error

Saving 10 x 14 in image
Saving 10 x 14 in image
Rows: 2 Columns: 9
── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): X1, X4, X6
dbl (5): X2, X3, X5, X7, X8

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
/Users/schilder/Library/Caches/org.R-project.R/R/BiocFileCache
  does not exist, create directory? (yes/no): yes
adding K562's chrHMM to local cache,future invocations will use local image
adding rname 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/wgEncodeBroadHmmK562HMM.bed.gz'
  |===============================================================================| 100%
Rows: 2 Columns: 9
── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): X1, X4, X6
dbl (5): X2, X3, X5, X7, X8

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  |===============================================================================| 100%
snapshotDate(): 2022-04-21
downloading 1 resources
retrieving 1 resource
  |===============================================================================| 100%
loading from cache
require("rtracklayer")
Working on: phase_1_05_jan_2022.S_1_R1
Working on: phase_1_05_jan_2022.S_1_R2
Working on: phase_1_05_jan_2022.S_1_R1
Working on: phase_1_05_jan_2022.S_1_R2
Working on: phase_2_03_feb_2022.S_4_R1
Working on: phase_2_03_feb_2022.S_4_R1
Working on: phase_2_28_jan_2022.S_2_R1
Working on: phase_2_28_jan_2022.S_3_R1
Working on: phase_2_28_jan_2022.S_4_R1
Working on: phase_2_28_jan_2022.S_5_R1
Working on: phase_2_28_jan_2022.S_6_R1
Working on: phase_2_28_jan_2022.S_2_R1
Working on: phase_2_28_jan_2022.S_3_R1
Working on: phase_2_28_jan_2022.S_4_R1
Working on: phase_2_28_jan_2022.S_5_R1
Working on: phase_2_28_jan_2022.S_6_R1
Working on: phase_3_18_feb_2022.S_10_R1
Working on: phase_3_18_feb_2022.S_11_R1
Working on: phase_3_18_feb_2022.S_8_R1
Working on: phase_3_18_feb_2022.S_9_R1
Working on: phase_3_18_feb_2022.S_10_R1
Working on: phase_3_18_feb_2022.S_11_R1
Working on: phase_3_18_feb_2022.S_8_R1
Working on: phase_3_18_feb_2022.S_9_R1
Working on: phase_4_24_mar_2022.s1_R1
Working on: phase_4_24_mar_2022.s11_R1
Working on: phase_4_24_mar_2022.s4_R1
Working on: phase_4_24_mar_2022.s6_R1
Working on: phase_4_24_mar_2022.s8_R1
Working on: phase_4_24_mar_2022.s9_R1
Working on: phase_4_24_mar_2022.s1_R1
Working on: phase_4_24_mar_2022.s11_R1
Working on: phase_4_24_mar_2022.s4_R1
Working on: phase_4_24_mar_2022.s6_R1
Working on: phase_4_24_mar_2022.s8_R1
Working on: phase_4_24_mar_2022.s9_R1
Working on: encode_ac
Working on: encode_me3
Quitting from lines 333-358 (EpiCompare.Rmd) 
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent
bschilder commented 2 years ago

Hm, same issue as before when reference is named list with only a single element. So not sure why I'm getting this dimnames error.

Same thing occurs when i remove picardfiles as an arg, so that doesn't seem to be it either.

Based on the latest error, the source of the bug is somewhere in here: https://github.com/neurogenomics/EpiCompare/blob/2b84d5eba23aec7ca6c5db1bfbd27e4d1606dc29/inst/markdown/EpiCompare.Rmd#L332

serachoi1230 commented 2 years ago

I've seen that error before, it's very strange, it comes and goes. Sometimes it runs all checks fine without errors but when I try run it, it gives me that error. Will have a look!

bschilder commented 2 years ago

Here is the full trackback, which might be helpful.

Seems the source of the issue is here: https://github.com/neurogenomics/EpiCompare/blob/3f59ce8e736d75ab3c0768fa617237dfffb8e696/R/overlap_stat_plot.R#L53

encode_ac
Saving 10 x 13.7 in image
Quitting from lines 296-311 (EpiCompare.Rmd) 
 Error in h(simpleError(msg, call)) : 
error in evaluating the argument 'x' in selecting a method for function 'elementMetadata': GRanges objects don't support [[, as.list(), lapply(), or unlist() at the
moment
35.
h(simpleError(msg, call))
34.
.handleSimpleError(function (cond) 
.Internal(C_tryCatchHelper(addr, 1L, cond)), "GRanges objects don't support [[, as.list(), lapply(), or unlist() at the\n moment", 
base::quote(getListElement(x, i, ...)))
33.
stop(wmsg(class(x), " objects don't support [[, as.list(), ", 
"lapply(), or unlist() at the moment"))
32.
getListElement(x, i, ...)
31.
getListElement(x, i, ...)
30.
reference[[1]]
29.
reference[[1]] at overlap_stat_plot.R#53
28.
S4Vectors::elementMetadata(reference[[1]]) at overlap_stat_plot.R#53
27.
ncol(S4Vectors::elementMetadata(reference[[1]])) at overlap_stat_plot.R#53
26.
overlap_stat_plot(reference = reference_tidy, peaklist = peaklist_tidy, 
annotation = txdb) at <text>#4
25.
eval(expr, envir, enclos)
24.
eval(expr, envir, enclos)
23.
eval_with_user_handlers(expr, envir, enclos, user_handlers)
22.
withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers))
21.
withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler)
20.
handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler))
19.
timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler)))
18.
evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, 
debug = debug, last = i == length(out), use_try = stop_on_error != 
2L, keep_warning = keep_warning, keep_message = keep_message, 
output_handler = output_handler, include_timing = include_timing)
17.
evaluate::evaluate(...)
16.
evaluate(code, envir = env, new_device = FALSE, keep_warning = !isFALSE(options$warning), 
keep_message = !isFALSE(options$message), stop_on_error = if (is.numeric(options$error)) options$error else {
if (options$error && options$include) 
0L ...
15.
in_dir(input_dir(), expr)
14.
in_input_dir(evaluate(code, envir = env, new_device = FALSE, 
keep_warning = !isFALSE(options$warning), keep_message = !isFALSE(options$message), 
stop_on_error = if (is.numeric(options$error)) options$error else {
if (options$error && options$include) ...
13.
eng_r(options)
12.
block_exec(params)
11.
call_block(x)
10.
process_group.block(group)
9.
process_group(group)
8.
withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), 
error = function(e) {
setwd(wd)
cat(res, sep = "\n", file = output %n% "") ...
7.
process_file(text, output)
6.
knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
5.
rmarkdown::render(input = markdown_path, output_dir = output_dir, 
output_file = output_filename, quiet = TRUE, params = list(peakfile = peakfiles, 
genome_build = genome_build, blacklist = blacklist, picard_files = picard_files, 
reference = reference, upset_plot = upset_plot, stat_plot = stat_plot, ... at EpiCompare.R#130
4.
EpiCompare::EpiCompare(peakfiles = peakfiles, picard_files = picardfiles, 
blacklist = hg19_blacklist, genome_build = "hg38", reference = ref_list[[nm]], 
stat_plot = TRUE, chromHMM_plot = TRUE, chipseeker_plot = TRUE, 
enrichment_plot = TRUE, save_output = TRUE, chromHMM_annotation = "K562", ...
3.
FUN(X[[i]], ...)
2.
lapply(names(ref_list), function(nm) {
message("\n", nm)
save_dir <- here::here("reports", nm)
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE) ...
1.
lapply(names(ref_list), function(nm) {
message("\n", nm)
save_dir <- here::here("reports", nm)
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE) ...
bschilder commented 2 years ago

Ok, so I figured this out. It had to do with the fact that reference[[1]] was being called regardless of whether reference was a list or an unlisted GRanges object. It looks like this was being done because overlap_stat_plot can only use a single reference at a time.

I've added a function that validates the reference format and returns the format required by that particular function: prepare_reference.R

However now I'm getting the original error again. I'm able to replicate the specific source of the error using just my peakfiles.

peaklist <- readRDS("~/Downloads/peakfiles.rds")
my_plot <- plot_chromHMM(peaklist = peaklist, 
                          cell_line = "K562",
                          genome_build = "hg19") 

Here's the traceback:

**Working on: encode_ac
Quitting from lines 336-361 (EpiCompare.Rmd) 
 Error in dimnames(x) <- dn : 
length of 'dimnames' [1] not equal to array extent
27.
`rownames<-`(`*tmp*`, value = names(peaklist)) at plot_chromHMM.R#81
26.
plot_chromHMM(peaklist = peaklist_tidy, chromHMM_annotation = chromHMM_list, 
genome_build = params$genome_build, interact = params$interact) at <text>#6
25.
eval(expr, envir, enclos)
24.
eval(expr, envir, enclos)
23.
eval_with_user_handlers(expr, envir, enclos, user_handlers)
22.
withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers))
21.
withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler)
20.
handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler))
19.
timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, 
envir, enclos, user_handlers)), warning = wHandler, error = eHandler, 
message = mHandler)))
18.
evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, 
debug = debug, last = i == length(out), use_try = stop_on_error != 
2L, keep_warning = keep_warning, keep_message = keep_message, 
output_handler = output_handler, include_timing = include_timing)
17.
evaluate::evaluate(...)
16.
evaluate(code, envir = env, new_device = FALSE, keep_warning = !isFALSE(options$warning), 
keep_message = !isFALSE(options$message), stop_on_error = if (is.numeric(options$error)) options$error else {
if (options$error && options$include) 
0L ...
15.
in_dir(input_dir(), expr)
14.
in_input_dir(evaluate(code, envir = env, new_device = FALSE, 
keep_warning = !isFALSE(options$warning), keep_message = !isFALSE(options$message), 
stop_on_error = if (is.numeric(options$error)) options$error else {
if (options$error && options$include) ...
13.
eng_r(options)
12.
block_exec(params)
11.
call_block(x)
10.
process_group.block(group)
9.
process_group(group)
8.
withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), 
error = function(e) {
setwd(wd)
cat(res, sep = "\n", file = output %n% "") ...
7.
process_file(text, output)
6.
knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
5.
rmarkdown::render(input = markdown_path, output_dir = output_dir, 
output_file = output_filename, quiet = TRUE, params = list(peakfile = peakfiles, 
genome_build = genome_build, blacklist = blacklist, picard_files = picard_files, 
reference = reference, upset_plot = upset_plot, stat_plot = stat_plot, ... at EpiCompare.R#130
4.
EpiCompare::EpiCompare(peakfiles = peakfiles, picard_files = picardfiles, 
blacklist = hg19_blacklist, genome_build = "hg38", reference = ref_list[nm], 
stat_plot = TRUE, chromHMM_plot = TRUE, chipseeker_plot = TRUE, 
enrichment_plot = TRUE, save_output = TRUE, chromHMM_annotation = "K562", ...
3.
FUN(X[[i]], ...)
2.
lapply(names(ref_list), function(nm) {
message("\n", nm)
save_dir <- here::here("reports", nm)
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE) ...
1.
lapply(names(ref_list), function(nm) {
message("\n", nm)
save_dir <- here::here("reports", nm)
dir.create(save_dir, showWarnings = FALSE, recursive = TRUE) ...**
bschilder commented 2 years ago

Ok, figured it out.

It was stemming from the fact that some of my peak GRanges in peaklist were not uniquely named. In plot_chromHMM there's a step where you name the rows of the genomation::heatTargetAnnotation output matrix: https://github.com/neurogenomics/EpiCompare/blob/dc31e81730b9c79adfc42ae743deb93399f8e09b/R/plot_chromHMM.R#L81

This causes an error because the matrix somehow aggregates items with the same name, producing a matrix with less rows than items in peaklist. But your manual naming step isn't really necessary since the matrix already uses the the peaklist names to name its rows, so I removed it.

In addition, I've added an extra check to make all peaklist names are unique, and if not, it gives the user a message saying they're being forced to be unique.

I've also gone through and simplified some of your code along the way, speeding it up wherever possible with more efficient equivalents.

bschilder commented 2 years ago

On a related note, the duplicate peak names were ultimately stemming from EpiCompare::gather_files.

nf-core/cutandrun creates duplicates of "peaks.stringent" files in different subfolders: ../04_reporting/ and ../03_peak_calling/

I didn't account for this when naming each file. I'll modify gather_files accordingly.