leekgroup / recount

R package for the recount2 project. Documentation website: http://leekgroup.github.io/recount/
https://jhubiostatistics.shinyapps.io/recount/
40 stars 9 forks source link

expressed_regions() is not working with the new IDIES and AWS bigWig file locations #23

Open lcolladotor opened 1 year ago

lcolladotor commented 1 year ago

Hi,

Currently in BioC release (3.16) and devel (3.17), recount is failing. That's because neither the new IDIES location nor AWS are allowing us to read the BigWig files from the web. I manually edited a local clone of recount to try with the IDIES location.

You can test this on AWS (through duffel) with:

regions <- expressed_regions("SRP002001", "chrY", cutoff = 5)

from https://github.com/leekgroup/recount/blob/7301c5fc09c968110d50617eef43a635b03fd4a2/tests/testthat/test-data.R#L112.

This is the type of warning we get:

2023-02-20 12:51:10 loadCoverage: loading BigWig file http://sciserver.org/public-data/recount2/data/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
> traceback()
8: stop(conditionMessage(output))
7: FUN(X[[i]], ...)
6: lapply(as.list(X), match.fun(FUN), ...)
5: lapply(as.list(X), match.fun(FUN), ...)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
3: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
2: derfinder::loadCoverage(files = meanFile, chr = chr, chrlen = chrlen) at expressed_regions.R#121
1: expressed_regions("SRP002001", "chrY", cutoff = 5)
2023-02-20 12:36:04 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443

I'm not sure what to do @nellore @ChristopherWilks.

I can try to provide a smaller test, digging into .loadCoverageBigWig() https://github.com/lcolladotor/derfinder/blob/5c1cbd412c5787bf2d2d778977e38dd6ae64976d/R/loadCoverage.R#L384 and well, ultimately rtracklayer.

Best, Leo

ChristopherWilks commented 1 year ago

A quick check on an older version of BioC (3.11) and rtracklayer (1.50.0) appears to work:

> project_info <- abstract_search("GSE32465")
> regions <- expressed_regions("SRP009615", "chrY",
+     cutoff = 5L,
+     maxClusterGap = 3000L
+ )
2023-02-20 19:23:16 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
2023-02-20 19:23:18 loadCoverage: applying the cutoff to the merged data
2023-02-20 19:23:18 filterData: originally there were 57227415 rows, now there are 57227415 rows. Meaning that 0 percent was filtered.
2023-02-20 19:23:18 findRegions: identifying potential segments
2023-02-20 19:23:18 findRegions: segmenting information
2023-02-20 19:23:18 .getSegmentsRle: segmenting with cutoff(s) 5
2023-02-20 19:23:18 findRegions: identifying candidate regions
2023-02-20 19:23:19 findRegions: identifying region clusters
> head(regions)
GRanges object with 6 ranges and 6 metadata columns:
    seqnames          ranges strand |     value      area indexStart  indexEnd
       <Rle>       <IRanges>  <Rle> | <numeric> <numeric>  <integer> <integer>
  1     chrY 2929794-2929829      * |  14.72650   530.154    2929794   2929829
  2     chrY 2956678-2956701      * |  12.81063   307.455    2956678   2956701
  3     chrY 2977203-2977227      * |   5.34908   133.727    2977203   2977227
  4     chrY 2977957-2977994      * |   6.46977   245.851    2977957   2977994
  5     chrY 2978850-2978871      * |   5.79766   127.548    2978850   2978871
  6     chrY 2979004-2979033      * |   6.79941   203.982    2979004   2979033
    cluster clusterL
      <Rle>    <Rle>
  1       1       36
  2       2       24
  3       3     2750
  4       3     2750
  5       3     2750
  6       3     2750
  -------
  seqinfo: 1 sequence from an unspecified genome
> tools:::.BioC_version_associated_with_R_version()
[1] ‘3.11’
> sessionInfo(package = NULL)
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] recount_1.16.1              SummarizedExperiment_1.20.0
 [3] Biobase_2.50.0              GenomicRanges_1.42.0
 [5] GenomeInfoDb_1.26.2         IRanges_2.24.1
 [7] S4Vectors_0.28.1            BiocGenerics_0.36.0
 [9] MatrixGenerics_1.2.1        matrixStats_0.58.0

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0         ellipsis_0.3.2           qvalue_2.15.0
  [4] htmlTable_2.1.0          XVector_0.30.0           base64enc_0.1-3
  [7] rstudioapi_0.13          bit64_4.0.5              AnnotationDbi_1.52.0
 [10] fansi_0.4.2              xml2_1.3.2               codetools_0.2-18
 [13] splines_4.0.2            cachem_1.0.1             knitr_1.31
 [16] jsonlite_1.7.2           Formula_1.2-4            Rsamtools_2.6.0
 [19] cluster_2.1.0            dbplyr_2.1.1             png_0.1-7
 [22] rentrez_1.2.3            readr_2.1.2              compiler_4.0.2
 [25] httr_1.4.2               backports_1.2.1          assertthat_0.2.1
 [28] Matrix_1.3-2             fastmap_1.1.0            limma_3.46.0
 [31] cli_3.0.1                htmltools_0.5.1.1        prettyunits_1.1.1
 [34] tools_4.0.2              gtable_0.3.0             glue_1.6.2
 [37] GenomeInfoDbData_1.2.4   reshape2_1.4.4           dplyr_1.0.8
 [40] rappdirs_0.3.3           doRNG_1.8.2              Rcpp_1.0.6
 [43] bumphunter_1.32.0        vctrs_0.3.8              Biostrings_2.58.0
 [46] rtracklayer_1.50.0       iterators_1.0.13         xfun_0.20
 [49] stringr_1.4.0            lifecycle_1.0.1          rngtools_1.5
 [52] XML_3.99-0.5             zlibbioc_1.36.0          scales_1.1.1
 [55] BSgenome_1.58.0          VariantAnnotation_1.36.0 hms_1.0.0
 [58] GEOquery_2.58.0          derfinderHelper_1.24.1   RColorBrewer_1.1-2
 [61] curl_4.3                 memoise_2.0.0            gridExtra_2.3
 [64] downloader_0.4           ggplot2_3.3.5            biomaRt_2.46.3
 [67] rpart_4.1-15             latticeExtra_0.6-29      stringi_1.5.3
 [70] RSQLite_2.2.3            foreach_1.5.1            checkmate_2.0.0
 [73] GenomicFeatures_1.42.2   BiocParallel_1.24.1      rlang_1.0.1
 [76] pkgconfig_2.0.3          GenomicFiles_1.26.0      bitops_1.0-6
 [79] lattice_0.20-41          purrr_0.3.4              GenomicAlignments_1.26.0
 [82] htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1
 [85] plyr_1.8.6               magrittr_2.0.1           R6_2.5.0
 [88] generics_0.1.3           Hmisc_4.5-0              DelayedArray_0.16.1
 [91] DBI_1.1.1                pillar_1.6.2             foreign_0.8-81
 [94] survival_3.2-7           RCurl_1.98-1.2           nnet_7.3-15
 [97] tibble_3.0.6             crayon_1.4.0             derfinder_1.24.2
[100] utf8_1.1.4               BiocFileCache_1.14.0     tzdb_0.2.0
[103] jpeg_0.1-8.1             progress_1.2.2           locfit_1.5-9.4
[106] grid_4.0.2               data.table_1.14.0        blob_1.2.1
[109] digest_0.6.27            tidyr_1.2.0              openssl_2.0.2
[112] munsell_0.5.0            askpass_1.1
ChristopherWilks commented 1 year ago

I'll need to setup a newer version of BioC with recount to further test the later versions, but I'm guessing this is an openssl<->rtracklayer interaction issue

lcolladotor commented 1 year ago

Awesome, thanks for this info Chris! I'll create an issue for https://github.com/lawremi/rtracklayer

lcolladotor commented 1 year ago

I just wrote https://github.com/lawremi/rtracklayer/issues/83. Let's see where that leads. Thanks again Chris!

lcolladotor commented 5 months ago

Note that I posted an update to https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270 today and updated the recount package to try to implement some workarounds. This is also related to #25 .