leekgroup / recount

R package for the recount2 project. Documentation website: http://leekgroup.github.io/recount/
https://jhubiostatistics.shinyapps.io/recount/
40 stars 9 forks source link

BiocFileCache integration #12

Closed PeteHaitch closed 6 years ago

PeteHaitch commented 6 years ago

Have you considered integrating with BiocFileCache to locally cache downloads, e.g., from recount::download_study()? I just found myself re-downloading the same study (in the same directory) for the n+1 time (my fault for not running code before thinking).

This would require a bit of work but might save on downloads (and perhaps make for a good student coding project?).

lcolladotor commented 6 years ago

Hi Pete,

The short answer is no, I haven't considered using BiocFileCache. Looks like it could be useful though https://bioconductor.org/packages/release/bioc/vignettes/BiocFileCache/inst/doc/BiocFileCache.html; just need to find that student hehe.

In the examples we posted we use file.exists() before download_study(). For example in https://github.com/leekgroup/recount-analyses/blob/gh-pages/example_de/recount_SRP032789.Rmd we use:

## Download the gene level RangedSummarizedExperiment data
if(!file.exists(file.path('SRP032789', 'rse_gene.Rdata'))) {
    download_study(project_info$project)
}

coverage_matrix() and expressed_regions() also have an outdir argument that when specified uses local files, in case you downloaded the BigWig files with download_study() and don't want to access them via the web every time you run the first 2 functions.

Best, Leo

PeteHaitch commented 6 years ago

The example is a good one and I've added it to my script. Thanks, Leo!