neurogenomics / EpiCompare

Comparison, benchmarking & QC of epigenetic datasets
https://doi.org/doi:10.18129/B9.bioc.EpiCompare
12 stars 3 forks source link

`Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x), : duplicate row.names: all` #88

Closed bschilder closed 1 year ago

bschilder commented 2 years ago

1. Bug description

Duplicate rownames error

Console output

-- Running width_boxplot() ---
Done.
Saving 10 x 7 in image
--- Running overlap_heatmap() ---
Done.
Saving HTML file
--- Running overlap_stat_plot() ---
Preparing reference.
Extracting GRanges object from list.
Done.
Saving 10 x 7 in image
Returning 1 matching cell line(s).
Rows: 2 Columns: 9
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): X1, X4, X6
dbl (5): X2, X3, X5, X7, X8

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
--- Running plot_chromHMM() ---
grlist is already in the output_build format. Skipping liftover.
Preparing peaklist.
Annotating with features.
Working on: GSE66023
Working on: ENCFF038DDS
Obtaining target annotation matrix.
Returning chrHMM_plot.
Saving HTML file
Computing precision-recall results.
--- Running plot_chromHMM() ---
grlist is already in the output_build format. Skipping liftover.
Preparing peaklist.
Annotating with features.
Working on: GSE66023
Working on: ENCFF038DDS
Obtaining target annotation matrix.
Returning chrHMM_plot.
Saving HTML file
Computing precision-recall results.
--- Running plot_chromHMM() ---
grlist is already in the output_build format. Skipping liftover.
Preparing peaklist.
Annotating with features.
Working on: GSE66023
Working on: ENCFF038DDS
Obtaining target annotation matrix.
Returning chrHMM_plot.
Saving HTML file
Computing precision-recall results.
--- Running plot_chromHMM() ---
grlist is already in the output_build format. Skipping liftover.
Preparing peaklist.
Removing 1 empty elements in peaklist
Annotating with features.
Working on: GSE66023
Obtaining target annotation matrix.
Returning chrHMM_plot.
Saving HTML file
Computing precision-recall results.
--- Running plot_chromHMM() ---
grlist is already in the output_build format. Skipping liftover.
Preparing peaklist.
Removing 1 empty elements in peaklist
Annotating with features.
Working on: GSE66023
Obtaining target annotation matrix.
Returning chrHMM_plot.
Saving HTML file
--- Running plot_ChIPseeker_annotation() ---
Quitting from lines 599-613 (EpiCompare.Rmd) 
Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x),  : 
  duplicate row.names: all
In addition: There were 17 warnings (use warnings() to see them)

Expected behaviour

Make rownames unique automatically.

2. Reproducible example

Code

(Please add the steps to reproduce the bug here. See here for an intro to making a reproducible example (i.e. reprex) and why they're important! This will help us to help you much faster.)

peaks_native <- PeakyFinders::import_peaks(ids = "GSE66023",
                                    builds = "hg19", 
                                    searches = list(genericPeak=".bed.gz"))
reference <- PeakyFinders::import_peaks(ids = "ENCFF038DDS", 
                                        builds = "GRCh38")

library(EpiCompare)
data("hg19_blacklist")
res <- EpiCompare::EpiCompare(peakfiles = peaks_native$GEO["GSE66023"],
                              genome_build = list(peakfiles="hg19",
                                                  reference="hg38",
                                                  blacklist="hg19"), 
                              blacklist = hg19_blacklist,
                              reference = reference$ENCODE, 
                              chromHMM_plot = TRUE,
                              chromHMM_annotation = "K562",
                              chipseeker_plot = TRUE,
                              enrichment_plot = TRUE,
                              tss_plot = TRUE, 
                              stat_plot = TRUE,
                              interact = TRUE,
                              save_output = TRUE,
                              output_dir = here::here("EpiCompare"))

3. Session info

``` R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 IRanges_2.30.0 S4Vectors_0.34.0 [6] BiocGenerics_0.42.0 EpiCompare_0.99.20 PeakyFinders_0.99.1 loaded via a namespace (and not attached): [1] utf8_1.2.2 reticulate_1.25 [3] R.utils_2.11.0 tidyselect_1.1.2 [5] htmlwidgets_1.5.4 RSQLite_2.2.14 [7] AnnotationDbi_1.58.0 grid_4.2.0 [9] BiocParallel_1.30.2 devtools_2.4.3 [11] scatterpie_0.1.7 munsell_0.5.0 [13] withr_2.5.0 colorspace_2.0-3 [15] GOSemSim_2.22.0 Biobase_2.56.0 [17] filelock_1.0.2 highr_0.9 [19] knitr_1.39 MACSr_1.4.0 [21] rstudioapi_0.13 DOSE_3.22.0 [23] labeling_0.4.2 MatrixGenerics_1.8.0 [25] GenomeInfoDbData_1.2.8 polyclip_1.10-0 [27] seqPattern_1.28.0 bit64_4.0.5 [29] farver_2.1.0 rprojroot_2.0.3 [31] basilisk_1.8.0 vctrs_0.4.1 [33] treeio_1.20.0 generics_0.1.2 [35] xfun_0.31 BiocFileCache_2.4.0 [37] regioneR_1.28.0 R6_2.5.1 [39] graphlayouts_0.8.0 locfit_1.5-9.5 [41] bitops_1.0-7 BRGenomics_1.8.0 [43] cachem_1.0.6 fgsea_1.22.0 [45] gridGraphics_0.5-1 DelayedArray_0.22.0 [47] assertthat_0.2.1 vroom_1.5.7 [49] promises_1.2.0.1 BiocIO_1.6.0 [51] scales_1.2.0 ggraph_2.0.5 [53] enrichplot_1.16.1 gtable_0.3.0 [55] processx_3.5.3 tidygraph_1.2.1 [57] rlang_1.0.2 genefilter_1.78.0 [59] splines_4.2.0 lazyeval_0.2.2 [61] impute_1.70.0 GEOquery_2.64.2 [63] BiocManager_1.30.18 yaml_2.3.5 [65] reshape2_1.4.4 crosstalk_1.2.0 [67] GenomicFeatures_1.48.3 httpuv_1.6.5 [69] qvalue_2.28.0 usethis_2.1.6 [71] tools_4.2.0 gridBase_0.4-7 [73] ggplotify_0.1.0 ggplot2_3.3.6 [75] ellipsis_0.3.2 gplots_3.1.3 [77] jquerylib_0.1.4 RColorBrewer_1.1-3 [79] sessioninfo_1.2.2 Rcpp_1.0.8.3 [81] plyr_1.8.7 progress_1.2.2 [83] zlibbioc_1.42.0 purrr_0.3.4 [85] RCurl_1.98-1.6 ps_1.7.0 [87] basilisk.utils_1.8.0 prettyunits_1.1.1 [89] viridis_0.6.2 SummarizedExperiment_1.26.1 [91] ggrepel_0.9.1 fs_1.5.2 [93] here_1.0.1 magrittr_2.0.3 [95] data.table_1.14.2 DO.db_2.9 [97] matrixStats_0.62.0 pkgload_1.2.4 [99] hms_1.1.1 patchwork_1.1.1 [101] mime_0.12 evaluate_0.15 [103] xtable_1.8-4 XML_3.99-0.9 [105] gridExtra_2.3 testthat_3.1.4 [107] compiler_4.2.0 biomaRt_2.52.0 [109] tibble_3.1.7 KernSmooth_2.23-20 [111] crayon_1.5.1 shadowtext_0.1.2 [113] R.oo_1.24.0 htmltools_0.5.2 [115] ggfun_0.0.6 later_1.3.0 [117] tzdb_0.3.0 tidyr_1.2.0 [119] geneplotter_1.74.0 aplot_0.1.5 [121] DBI_1.1.2 genomation_1.28.0 [123] tweenr_1.0.2 ExperimentHub_2.4.0 [125] ChIPseeker_1.32.0 dbplyr_2.1.1 [127] MASS_7.3-57 rappdirs_0.3.3 [129] boot_1.3-28 Matrix_1.4-1 [131] readr_2.1.2 brio_1.1.3 [133] piggyback_0.1.3 cli_3.3.0 [135] R.methodsS3_1.8.1 parallel_4.2.0 [137] igraph_1.3.1 pkgconfig_2.0.3 [139] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicAlignments_1.32.0 [141] dir.expiry_1.4.0 plotly_4.10.0 [143] xml2_1.3.3 ggtree_3.4.0 [145] annotate_1.74.0 bslib_0.3.1 [147] XVector_0.36.0 yulab.utils_0.0.4 [149] callr_3.7.0 stringr_1.4.0 [151] digest_0.6.29 Biostrings_2.64.0 [153] rmarkdown_2.14 fastmatch_1.1-3 [155] tidytree_0.3.9 restfulr_0.0.13 [157] curl_4.3.2 shiny_1.7.1 [159] Rsamtools_2.12.0 gtools_3.9.2.1 [161] rjson_0.2.21 lifecycle_1.0.1 [163] nlme_3.1-157 jsonlite_1.8.0 [165] desc_1.4.1 viridisLite_0.4.0 [167] limma_3.52.1 BSgenome_1.64.0 [169] fansi_1.0.3 pillar_1.7.0 [171] lattice_0.20-45 pkgbuild_1.3.1 [173] plotrix_3.8-2 KEGGREST_1.36.0 [175] fastmap_1.1.0 httr_1.4.3 [177] survival_3.3-1 GO.db_3.15.0 [179] remotes_2.4.2 interactiveDisplayBase_1.34.0 [181] glue_1.6.2 png_0.1-7 [183] BiocVersion_3.15.2 bit_4.0.4 [185] sass_0.4.1 ggforce_0.3.3 [187] stringi_1.7.6 blob_1.2.3 [189] DESeq2_1.36.0 AnnotationHub_3.4.0 [191] caTools_1.18.2 memoise_2.0.1 [193] dplyr_1.0.9 ape_5.6-2 ```
bschilder commented 2 years ago

This can be added at the very beginning of the EpiCompare::EpiCompare rmarkdown script where each grlist is checked. If all names are identical, simply assign that GRanges names to NULL

For a single GRanges object:

names(gr) <- NULL

I've also fixed PeakyFinders so that it does this when importing peak files to begin with, but would still be good to have in EpiCompare in case someone has a similar situation.