Closed mfranke-2 closed 1 month ago
@deeenes, could you have a look at this?
Hi! I just wanted to follow up on this - It seems that the ensembl server is checking for certain headers to determine if it's a browser request or request from code (and it's blocking requests from code). Maybe adding a header argument when reading the .html file could fix this?
@mfranke-2 Thanks for the tip, I'll look into it! Though now I managed to download it with the default headers of cli curl
. There is something weird going on with ensembl.org, since the update of their ssl certificate last month the CI under OSX fails. It's quite possible the two issues are the same.
Did you experience the issue in different times, on different networks and computers, or it happened only once? Have you tried it again since then?
Interesting, yes I think you're right that the two issues are related or the same! In R, something as simple as the following yields the 403 error for me:
download.file("https://useast.ensembl.org/info/about/species.html", destfile = "test.txt", headers = c("User-Agent" = "My Custom User Agent"))
but changing the header argument fixes the issue:
download.file("https://useast.ensembl.org/info/about/species.html", destfile = "test.txt", headers = c("User-Agent" = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15"))
I've tried it on two different computers and received the same error. I also tried to run again this morning and again had the error.
Hi @mfranke-2, Many thanks for finding this out! Now I played around a bit, and I didn't manage to reproduce the issue from within the package. Still, it's clear that ensembl.org gives the HTTP 403 error depending on the User-Agent
. In OmnipathR
, httr::GET
performs the download, which relies on curl
under the hood. Its default user agent is libcurl/8.5.0 r-curl/5.2.1 httr/1.4.7
. I've tried several things, but long story short, apparently Ensembl accepts the requests if it sees certain keywords in the user agent, such as "curl"
:
r <- download.file(
"https://useast.ensembl.org/info/about/species.html",
destfile = "test.txt"
headers = c("User-Agent" = "curl"),
method = 'libcurl'
)
trying URL 'https://useast.ensembl.org/info/about/species.html'
downloaded 238 KB
r <- download.file(
"https://useast.ensembl.org/info/about/species.html",
destfile = "test.txt"
headers = c("User-Agent" = "cur"),
method = 'libcurl'
)
trying URL 'https://useast.ensembl.org/info/about/species.html'
Error in download.file("https://useast.ensembl.org/info/about/species.html", :
cannot open URL 'https://useast.ensembl.org/info/about/species.html'
In addition: Warning messages:
1: In download.file("https://useast.ensembl.org/info/about/species.html", :
downloaded length 0 != reported length 0
2: In download.file("https://useast.ensembl.org/info/about/species.html", :
cannot open URL 'https://useast.ensembl.org/info/about/species.html': HTTP status was '403 Forbidden'
It also depends on the mirror, the useast
mirror reproduces the error, while the www
one doesn't:
r <- download.file(
"https://www.ensembl.org/info/about/species.html",
destfile = "test.txt"
headers = c("User-Agent" = "cur"),
method = 'libcurl'
)
trying URL 'https://www.ensembl.org/info/about/species.html'
downloaded 238 KB
This latter is used in OmnipathR
, I don't know if redirect might happen at all, though the appearance of this error suggests it does.
As a solution, I set OmnipathR
to use a browser like user agent in all queries to Ensembl, configurable by options("omnipath.user_agent")
.
@deeenes It works again! Thank you so much!
Hi, I have been using CollecTRI successfully until recently, when I started receiving the following error:
decoupleR::get_collectri(organism="human" split_complexes=FALSE)
Any help would be greatly appreciated!
Best, Megan