neurogenomics / EpiCompare

Comparison, benchmarking & QC of epigenetic datasets
https://doi.org/doi:10.18129/B9.bioc.EpiCompare
12 stars 3 forks source link

Issues with download button #148

Open serachoi1230 opened 1 year ago

serachoi1230 commented 1 year ago

I think there are issues with the download button for upset plots? I get this error:

--- Running overlap_upset_plot() ---
overlap_upset_plot(): Done in 40.8s.
Quitting from lines 338-351 (EpiCompare.Rmd) 
Error in paste(unlist(l), collapse = if (is.null(newline)) "" else newline) : 
  result would exceed 2^31-1 bytes
bschilder commented 1 year ago

Copied from email correspondence with @serachoi1230

Where did you see the error? (locally, GHA, Bioc servers)

Thank you so much. I saw this error locally. So first I got this error, to do with "download button", for upset plot: https://github.com/neurogenomics/EpiCompare/issues/148. Just as a hot fix, I commented out the "download button" function and everything ran until the tss plot step.

At tss plot step, I got the same error as upset plot, again to do with "download button", so I commented out the "download button" function once again. Then I got the error while generating the html report.

bschilder commented 1 year ago

Could you please fill out the full Bug Report template @serachoi1230 ? I'm afraid I don't have enough information (what code was run, which versions of deps you have installed, machine type, etc) to replicate the issue.

bschilder commented 1 year ago

I can't seem to replicate the issue locally, and the GHA checks aren't encountering it. So I this this might have to do with your local setup (e.g. outdated R or system dependencies).

Screenshot 2023-02-24 at 01 20 59

serachoi1230 commented 1 year ago

Thanks @bschilder ! I think it's to do with using large list of peak files? I just tried to update my R, and I still ran into the same issue. If it's a problem with my local machine, I won't be able to generate the report. Would would please try generate the report and push to EpiCompare/report? Here is the code:

### ATAC-seq vs. DNase-seq vs. CHiP-seq ###
## ATAC-seq peakfile
atac1_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/ATAC/ENCFF558BLC.bed", as="GRanges")
atac1_hg38_unique <- unique(atac1_hg38)
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(atac1_hg38_unique)$qValue <-GenomicRanges::mcols(atac1_hg38_unique)$V9

## Dnase-seq peakfile
dna1_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/Dnase/ENCFF274YGF.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(dna1_hg38)$qValue <-GenomicRanges::mcols(dna1_hg38)$V9

dna2_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/Dnase/ENCFF185XRG.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(dna2_hg38)$qValue <- GenomicRanges::mcols(dna2_hg38)$V9

## ChIP-seq peakfile
chip_hg38 <- ChIPseeker::readPeakFile("/Users/serachoi/Documents/EpiCompare_extra/peakfiles/ENCODE/ENCODE_H3K27ac_hg38_ENCFF038DDS.bed", as="GRanges")
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(chip_hg38)$qValue <- GenomicRanges::mcols(chip_hg38)$V9

## Peaklist
peaklist <- list("ATAC_ENCFF558BLC" = atac1_hg38_unique,
                 "Dnase_ENCFF274YGF" = dna1_hg38,
                 "ChIP_ENCFF038DDS" = chip_hg38)

## Reference
reference <- list("Dnase_ENCFF185XRG_reference"=dna2_hg38)

## Blacklist
data("hg38_blacklist")

## Run Epicompare ##
EpiCompare(peakfiles = peaklist,
           genome_build = "hg38",
           genome_build_output = "hg38",
           blacklist = hg38_blacklist,
           reference = reference,
           picard_files = NULL,
           upset_plot = T,
           stat_plot = T,
           save_output = F,
           chromHMM_plot = T,
           chromHMM_annotation = "K562",
           chipseeker_plot = T,
           enrichment_plot = T,
           tss_plot = T,
           precision_recall_plot =T,
           corr_plot = T,
           interact = T,
           output_dir = "/Users/serachoi/Desktop")

Then we can at least reply to reviewers!

bschilder commented 1 year ago

Hey @serachoi1230 , thanks for sharing the code. But it looks like the file paths are hard-coded to a folder on your laptop that I don't have access to. The reprex should include all steps necessary to download the files from a public resource.

In the meantime, I might be able to use PeakyFinders to find a import the relevant files.

serachoi1230 commented 1 year ago

@bschilder , oh shoot my bad, sorry! For each file I manually downloaded them into my laptop and ran this:

encode <- ChIPseeker::readPeakFile("path", as = "GRanges")

Here are links to ENCODE for each peak file:

Thank you so much, let me know if you need anything else!

bschilder commented 1 year ago

Thanks for the info @serachoi1230 .

Here's how you would do this programmatically:

#### Download files ####
ids <- list(atac1="ENCFF558BLC",
            dna1="ENCFF274YGF",
            dna2="ENCFF185XRG",
            chip1="ENCFF038DDS")

files <- lapply(ids, function(id){
  URL <- paste0(
    "https://www.encodeproject.org/files/",id,
    "/@@download/",id,".bed.gz"
  )
  f <- file.path(tempdir(),basename(URL))
  if(!file.exists(f)){
    utils::download.file(URL, f) 
  }
  return(f)
})

Working on running it now

bschilder commented 1 year ago

Ok, so I was able to replicate the error you described. I think you're right, it has something to do with the size of the data, as I'm able to run it all the way through if I limit all peakfiles to just chromosome 1.

We can use the chr1-only report as an example for now (I'll upload this version to the Releases), but I will continue looking into the original error. I suspect it has something to do with the storage overhead introduced by the download buttons (I may have to reconfigure them to make them more efficient).

In the meantime, I think we can go ahead and resubmit.

I'm including the full reprex script as a new file in the package: inst/examples/atac_dnase_chip_example.R

### ATAC-seq vs. DNase-seq vs. CHiP-seq ###

#### Download files ####
ids <- list(atac1="ENCFF558BLC",
            dna1="ENCFF274YGF",
            dna2="ENCFF185XRG",
            chip1="ENCFF038DDS")

files <- lapply(ids, function(id){
  URL <- paste0(
    "https://www.encodeproject.org/files/",id,
    "/@@download/",id,".bed.gz"
  )
  f <- file.path(tempdir(),basename(URL))
  if(!file.exists(f)){
    utils::download.file(URL, f) 
  }
  return(f)
})

#### ATAC-seq peakfile ####
atac1_hg38 <- ChIPseeker::readPeakFile(files$atac1)
atac1_hg38 <- unique(atac1_hg38)
#qValue from ENCODE is V9, renaming so it can be found
GenomicRanges::mcols(atac1_hg38)$qValue <-GenomicRanges::mcols(atac1_hg38)$V9

#### DNase-seq peakfile ####
dna1_hg38 <-  ChIPseeker::readPeakFile(files$dna1)
GenomicRanges::mcols(dna1_hg38)$qValue <-GenomicRanges::mcols(dna1_hg38)$V9

dna2_hg38 <-ChIPseeker::readPeakFile(files$dna2)
GenomicRanges::mcols(dna2_hg38)$qValue <- GenomicRanges::mcols(dna2_hg38)$V9

#### ChIP-seq peakfile ####
chip_hg38 <- ChIPseeker::readPeakFile(files$chip1)
GenomicRanges::mcols(chip_hg38)$qValue <- GenomicRanges::mcols(chip_hg38)$V9

## Peaklist
peaklist <- list("ATAC_ENCFF558BLC" = atac1_hg38,
                 "Dnase_ENCFF274YGF" = dna1_hg38,
                 "ChIP_ENCFF038DDS" = chip_hg38)
peaklist_chr1 <- EpiCompare:::remove_nonstandard_chrom(grlist = peaklist, 
                                                       keep_chr = "chr1")
## Reference
reference <- list("Dnase_ENCFF185XRG_reference"=dna2_hg38)
reference_chr1 <- EpiCompare:::remove_nonstandard_chrom(grlist = reference,
                                                        keep_chr = "chr1")

#### Run Epicompare ####
html_out <- EpiCompare::EpiCompare(peakfiles = peaklist_chr1,
                                   reference = reference_chr1, 
                                   genome_build = "hg38",
                                   genome_build_output = "hg38",  
                                   output_dir = tempdir(), 
                                   run_all = TRUE)
bschilder commented 1 year ago

Also added a new EpiCompare::EpiCompare arg named add_download_button and set the default to FALSE to avoid this issue until I can resolve it.

These changes have now all been pushed to GitHub (EpiCompare v1.3.4)

serachoi1230 commented 1 year ago

Thank you so much @bschilder, this is really great!

Where did you upload the example report?

bschilder commented 1 year ago

Thank you so much @bschilder, this is really great!

Where did you upload the example report?

The file is in Releases (link above). But it's not rendered on the website yet bc GHA is failing for unrelated reasons. But go ahead and resubmit the paper now and I'll fix the site manually over the weekend so it renders in time for the reviewers

bschilder commented 1 year ago

I've now pushed the updated website with the new example report.