Open lcolladotor opened 1 year ago
Ahh, note that @ChristopherWilks posted at https://github.com/leekgroup/recount/issues/23#issuecomment-1437469404 that the same code works with rtracklayer
version 1.50.0 from BioC 3.11.
Support for SSL depends on having the openssl library available at build time. It's conceivable that either the user is building the package from source without openssl, or Bioconductor at some point stopped providing Mac binaries with openssl support.
Hi Michael,
The user in this case is me but also, the Bioc machines were reporting the same error. I did notice that the openssl package wasn't being loaded from my R session info above. http://bioconductor.org/checkResults/release/bioc-LATEST/recount/ doesn't show the error anymore, but that's because I turned it into a warning with some edits to the tests at https://github.com/leekgroup/recount/commit/5f2696d5ff9e30c4e1198c699542c0555b96ad63 that rely on tryCatch()
.
I'll ask on bioc-devel to see if someone else knows about a change in the rtracklayer
binaries.
Best, Leo
Also, my collaborator @nellore pointed out we had run into a similar issue back in 2016 as noted at https://support.bioconductor.org/p/81267/
By "openssl" I mean the C library, not the R package. I'm guessing that the Bioconductor build machine needs to be configured to build openssl support into the Mac binary. This should be as simple as installing openssl with brew. Would you happen to know the right person to contact about that?
Hi Michael,
Jennifer and Hervé replied at https://stat.ethz.ch/pipermail/bioc-devel/2023-March/019503.html. It sounds like BioC is building the packages with openssl
C Library support.
Do you have other leads? cc @ChristopherWilks @nellore.
Best, Leo
Hi,
I no longer get the No openssl available in netConnectHttps for
error message part anymore, however, just like #73, I'm still encountering issues with derfinder
and thus also recount
with remote BigWigFile imports through rtracklayer
.
The above link (first message on this thread) has changed from "http://sciserver.org/public-data/recount2/data/SRP002001/bw/mean_SRP002001.bw"
to "http://data.idies.jhu.edu/recount2/data/SRP002001/bw/mean_SRP002001.bw"
. duffel
currently points to AWS, but with all 3 links I get the same type of error.
Here's the small reproducible code:
## Remotely access from duffel link
library("GenomicRanges")
library("rtracklayer")
range <- GRanges(seqnames = "chrY", ranges = IRanges(1, 57227415))
rtracklayer::import("http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw", selection = reduce(range), as = "RleList")
traceback()
options(width = 120)
sessioninfo::session_info()
curl::curl_version()
Here's the R output
> ## Remotely access from duffel link
> library("GenomicRanges")
> library("rtracklayer")
> range <- GRanges(seqnames = "chrY", ranges = IRanges(1, 57227415))
> rtracklayer::import("http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw", selection = reduce(range), as = "RleList")
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
Couldn't open https://recount-opendata.s3.amazonaws.com/recount2/SRP002001/bw/mean_SRP002001.bw
> traceback()
7: seqinfo(con)
6: seqinfo(con)
5: .local(con, format, text, ...)
4: import(FileForFormat(con), ...)
3: import(FileForFormat(con), ...)
2: rtracklayer::import("http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw",
selection = reduce(range), as = "RleList")
1: rtracklayer::import("http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw",
selection = reduce(range), as = "RleList")
> options(width = 120)
> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.5
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-05-20
pandoc 3.1.12.1 @ /opt/homebrew/bin/pandoc
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.4.0)
Biobase 2.64.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
BiocGenerics * 0.50.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
BiocIO 1.14.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
BiocParallel 1.38.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
Biostrings 2.72.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.4.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.4.0)
curl 5.2.1 2024-03-01 [1] CRAN (R 4.4.0)
DelayedArray 0.30.1 2024-05-07 [1] Bioconductor 3.19 (R 4.4.0)
GenomeInfoDb * 1.40.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
GenomeInfoDbData 1.2.12 2024-05-03 [1] Bioconductor
GenomicAlignments 1.40.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
GenomicRanges * 1.56.0 2024-05-01 [1] Bioconductor 3.19 (R 4.4.0)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.0)
IRanges * 2.38.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0)
Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0)
MatrixGenerics 1.16.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
matrixStats 1.3.0 2024-04-11 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
RCurl 1.98-1.14 2024-01-09 [1] CRAN (R 4.4.0)
restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.4.0)
rjson 0.2.21 2022-01-09 [1] CRAN (R 4.4.0)
Rsamtools 2.20.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
rtracklayer * 1.64.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
S4Arrays 1.4.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
S4Vectors * 0.42.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
SparseArray 1.4.3 2024-05-07 [1] Bioconductor 3.19 (R 4.4.0)
SummarizedExperiment 1.34.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
UCSC.utils 1.0.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
XML 3.99-0.16.1 2024-01-22 [1] CRAN (R 4.4.0)
XVector 0.44.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.4.0)
zlibbioc 1.50.0 2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
> curl::curl_version()
$version
[1] "8.6.0"
$ssl_version
[1] "(SecureTransport) LibreSSL/3.3.6"
$libz_version
[1] "1.2.12"
$libssh_version
[1] NA
$libidn_version
[1] NA
$host
[1] "x86_64-apple-darwin23.0"
$protocols
[1] "dict" "file" "ftp" "ftps" "gopher" "gophers" "http" "https" "imap" "imaps" "ldap"
[12] "ldaps" "mqtt" "pop3" "pop3s" "rtsp" "smb" "smbs" "smtp" "smtps" "telnet" "tftp"
$ipv6
[1] TRUE
$http2
[1] TRUE
$idn
[1] FALSE
Here's more code for testing with the IDIES or AWS links directly, thus bypassing the redirect service provided by duffel
. The results are the same.
Note that download the file with download.file(mode = "wb")
then using rtracklayer::import()
works.
R code:
## Remotely access from IDIES link
library("GenomicRanges")
library("rtracklayer")
range <- GRanges(seqnames = "chrY", ranges = IRanges(1, 57227415))
rtracklayer::import("http://data.idies.jhu.edu/recount2/data/SRP002001/bw/mean_SRP002001.bw", selection = reduce(range), as = "RleList")
traceback()
## Remotely access from AWS link
library("GenomicRanges")
library("rtracklayer")
range <- GRanges(seqnames = "chrY", ranges = IRanges(1, 57227415))
rtracklayer::import("https://recount-opendata.s3.amazonaws.com/recount2/SRP002001/bw/mean_SRP002001.bw", selection = reduce(range), as = "RleList")
traceback()
## Locally download data from duffel link
temp_duffel <- tempfile("mean_SRP002001_duffel.bw")
download.file("http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw", temp_duffel, mode = "wb")
chr <- "chrY"
rtracklayer::import(BigWigFile(temp_duffel), selection = reduce(range), as = "RleList")[[chr]]
## Locally download data from IDIES link
temp_idies <- tempfile("mean_SRP002001_idies.bw")
download.file("http://data.idies.jhu.edu/recount2/data/SRP002001/bw/mean_SRP002001.bw", temp_idies, mode = "wb")
rtracklayer::import(BigWigFile(temp_idies), selection = reduce(range), as = "RleList")[[chr]]
## Locally download data from AWS link
temp_aws <- tempfile("mean_SRP002001_aws.bw")
download.file("https://recount-opendata.s3.amazonaws.com/recount2/SRP002001/bw/mean_SRP002001.bw", temp_aws, mode = "wb")
rtracklayer::import(BigWigFile(temp_aws), selection = reduce(range), as = "RleList")[[chr]]
R output:
Building recount
is failing on BioC 3.19 and 3.20 ultimately due to this issue but with a much larger BigWig file (1327.5 MB vs 48.6 from the earlier reproducible example). This was reported to me at https://github.com/leekgroup/recount/issues/25. https://bioconductor.org/checkResults/release/bioc-LATEST/recount/nebbiolo1-buildsrc.html points to https://github.com/leekgroup/recount/blob/c3fa29a46c64598a51c54df73cfbcf1252389c80/vignettes/recount-quickstart.Rmd#L524-L528, which can be reduced to just this code:
library("GenomicRanges")
library("rtracklayer")
files <- "http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw"
chr <- "chrY"
chrlen <- 57227415
bList <- BigWigFileList(files)
which <- GRanges(seqnames = chr, ranges = IRanges(1, chrlen))
x <- import(bList[[1]], selection = reduce(which), as = "RleList")
traceback()
Here's the output:
> files <- "http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw"
> chr <- "chrY"
> chrlen <- 57227415
> bList <- BigWigFileList(files)
> which <- GRanges(seqnames = chr, ranges = IRanges(1, chrlen))
> x <- import(bList[[1]], selection = reduce(which), as = "RleList")
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
Couldn't open https://recount-opendata.s3.amazonaws.com/recount2/SRP009615/bw/mean_SRP009615.bw
> traceback()
5: seqinfo(con)
4: seqinfo(con)
3: .local(con, format, text, ...)
2: import(bList[[1]], selection = reduce(which), as = "RleList")
1: import(bList[[1]], selection = reduce(which), as = "RleList")
Note that downloading the file locally does work. But well, connection issues can pop up way more frequently when such a large file is being downloaded.
> temp_SRP009615 <- tempfile("mean_SRP009615_duffel.bw")
> download.file(files, temp_SRP009615, mode = "wb")
trying URL 'http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw'
Content type 'binary/octet-stream' length 1392034077 bytes (1327.5 MB)
==================================================
downloaded 1327.5 MB
> rtracklayer::import(BigWigFile(temp_SRP009615), selection = reduce(which), as = "RleList")[[chr]]
numeric-Rle of length 57227415 with 150774 runs
Lengths: 2781486 36 1 36 8 18 ... 57 36 49 36 339535
Values : 0.000000 0.286485 0.000000 0.372865 0.000000 0.323784 ... 0.000000 0.323784 0.000000 0.310604 0.000000
I guess that I could expand derfinder
to attempt to download the file locally with 3 retries, similar to how it currently tries to import the data remotely with rtracklayer
3 times https://github.com/lcolladotor/derfinder/blob/f9cd986e0c1b9ea6551d0d8d2077d4501216a661/R/loadCoverage.R#L396-L410. But it doesn't seem like the best solution to me, as one of the appeals of the BigWigFile format was the option to remotely access parts of it.
Let me know if I can provide any more useful information.
Best, Leo
Hi,
I'm having trouble importing remote BigWig files with
derfinder
, which internally usesrtracklayer::import()
. I noticed this when looking atrecount
which is failing on BioC 3.16 and 3.17 (details at https://github.com/leekgroup/recount/issues/23).You can reproduce this issue with:
I noticed issue #63 and saw that PR #68 fixed that issue with
rtracklayer
version 1.55.4 https://github.com/sanchit-saini/rtracklayer/blob/fa2a29d01f4f2975d8e2fe0de5ce4073b4e6b187/DESCRIPTION#L3 (which I'm assuming is already part of version 1.58.0 I'm using).I also saw #73, and can tell that these are different errors since the canonical message here is
No openssl available in netConnectHttps for sciserver.org : 443
.You get a similar error with
duffel
(
duffel
currently points to AWS https://github.com/nellore/digitalocean-duffel/commit/c6e53d5001389acdd65484bbcf1b910936b36b8c so these two are the same. IDIES is a different mirror forrecount2
data).Let me know if there's any piece of info that might be helpful to you.
Best, Leo