grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

SSL certificate problem: unable to get local issuer certificate #39

Closed c-mertes closed 3 years ago

c-mertes commented 3 years ago

Dear biomaRt team,

since I moved to GitHub actions I get the following error in my CI:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL certificate problem: unable to get local issuer certificate

when running getBM()

Interestingly, I only get it in r-release but not in r-devel under ubuntu-latest and also can not reproduce it locally. So not sure if there is an issue with the package or if I have to install a certificate. I do not think its the certificate, as it works on the same OS with the R devel version.

This is the link to the error: https://github.com/c-mertes/FRASER/runs/1731000855?check_suite_focus=true#step:14:249 And the session info: https://github.com/c-mertes/FRASER/runs/1731000855?check_suite_focus=true#step:11:136

And this is the code line that triggers it:

BiocManager::install("FRASER")
library(FRASER)
fds <- createTestFraserDataSet()
fds <- annotateRanges(fds, GRCh=38)

Best, Christian

grimbough commented 3 years ago

Thanks for reporting this. I think it stems from a change in certificate on the Ensembl side.

A work around is to run this code, which will persist for the duration of the R session:

new_config <- httr::config(ssl_verifypeer = FALSE)
httr::set_config(new_config, override = FALSE)

You can try putting that in your workflow in the Check R check step, before the call to devtools::check(). It would be good to know if that makes the problem go away.

The reason you don't see it on devel is that I've tried to address this when biomaRt is loaded, by testing for the issue and setting that fix automatically if needed (https://github.com/grimbough/biomaRt/blob/master/R/zzz.R).

The fix is only in the devel branch right now as I didn't want to introduce something experimental that broke release. It's probably been long enough without anyone complaining that I can merge the patch into the release branch. Then you won't need to change the setting manually.

Cheers, Mike

c-mertes commented 3 years ago

Thanks for the hint. I added it to the script. But unfortunately, this did not help to fix the problem.

I basically run now in an Rscript:

new_config <- httr::config(ssl_verifypeer = FALSE)
httr::set_config(new_config, override = FALSE)
devtools::check(vignettes = FALSE, args = '--timings')

Code: https://github.com/c-mertes/FRASER/blob/664a569854c3a890dac07ce1ffe89389e7bc89e6/.github/workflows/r.yml#L94 Error: https://github.com/c-mertes/FRASER/runs/1740913261?check_suite_focus=true#step:14:245 SessionInfo: https://github.com/c-mertes/FRASER/runs/1740913261?check_suite_focus=true#step:11:7554

grimbough commented 3 years ago

I always forget whether devtools::check() is actually launching another R process, so it maybe our settings don't propagate. You could probably put them in the example block in the man page to check this.

However, I just went to do the code merge and realised I've already done it. So the release version of biomaRt should be able to cope with this already. I'll do some more digging.

grimbough commented 3 years ago

So the problem was something I alluded to earlier, where the Ensembl uswest mirror triggers the problem, but none of the other mirrors do. Things run on GHA end up querying that mirror, but the Bioconductor build system doesn't. I was explicitly querying the main Ensembl server to test for the issue, but then allowing redirects the rest of the time. I'd fixed that behaviour in devel, but not the release branch.

You can wait a couple of days for biomaRt version 2.46.1 to propagate through Bioconductor and the issue will hopefully go away. If you want to test now you can explicitly installed that version from Github in your workflow via:

BiocManager::install("grimbough/biomaRt", ref = "RELEASE_3_12")

Let me know if the problems persist.

c-mertes commented 3 years ago

I tested it now with the new release and now it fails on the namespace loading.

I guess I have even a different error out of those you are trying to catch in your zzz.R script. The problem is then that there is no new_config set. https://github.com/grimbough/biomaRt/blob/0e057247e8000124f8063db893397befbf93ff10/R/zzz.R#L10

And since you catch the error, it is not visible and hence, I can't tell what kind of error it is. The error is now on windows and on ubuntu: https://github.com/c-mertes/FRASER/runs/1744692437?check_suite_focus=true#step:14:214

I also occasionally got this error on GHA: Ensembl site unresponsive, trying XXX mirror

grimbough commented 3 years ago

Not my best patch if it broke more platforms!

I've created a new branch to test this, which you can install via:

BiocManager::install('grimbough/biomaRt', ref = '3_12_testing')

This now has a default case in the error handling if/else and prints out the contents of the test variable each time through. Hopefully we'll be able to see what message you get if it continues to fail.

This passed my own GHA workflow on all platforms (https://github.com/grimbough/biomaRt/actions/runs/503937772) so I'm hoping the package at least loads for you, but then so did the last version you installed.

Here's an example of the output when I load the library on my home PC.

> library(biomaRt)
Failed test 1: Error in curl::curl_fetch_memory(url, handle = handle) : 
  error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure

Failed test 2: Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL certificate problem: unable to get local issuer certificate
>

The Ensembl site unresponsive, trying XXX mirror warning is because I find the Ensembl sites to be quite unreliable. If you submit a biomaRt query and nothing comes back within 10 seconds, it will automatically try to use a mirror site. Sometimes it can try all 4 sites before giving up.

One thing biomaRt does to try and help with this is to cache any successful query on disk, and then if the same query is run again it just loads from the cache instead. There's a bit more detail in the vignette and ?biomartCacheInfo. It might not be documented, but you can also set the location of the cache with the environment variable BIOMART_CACHE.

Maybe you could get GitHub to save and restore the cache location between runs, then it won't try to query the Ensembl site as frequently.

c-mertes commented 3 years ago

Dear @grimbough, thanks for the help and trying to get to the bottom of this. Somehow now after using your new version 3_12_testing it works as supposed to. I assumed I will see a different error, but this is not the case. So I'm happy and can close the issue. If you need more info please ping me again.

grimbough commented 3 years ago

Thanks for the feedback. The patch is made it's way into biomaRt version 2.46.2, which you can now get directly from Bioconductor. Hopefully that resolves the Travis issues, but please re-open the issue if you see a re-occurrence.

averissimo commented 3 years ago

Has this been corrected in Bioconductor devel (3.13)?

I'm getting this error on the Linux machine (malbec2) when calling getBM

https://bioconductor.org/checkResults/devel/bioc-LATEST/glmSparseNet/malbec2-checksrc.html

biomaRt::getBM(attributes = c("external_gene_name",
"ensembl_gene_id"),
filters = "ensembl_gene_id",
values = ensembl.genes,
useCache = use.cache,
# verbose = TRUE,
mart = mart)
Error in curl::curl_fetch_memory(url, handle = handle): SSL certificate problem: unable to get local issuer certificate
grimbough commented 3 years ago

@averissimo In BioC 3.13 I've moved this code from being set on package load to a more targeted test. I liked neither the slowdown to the package loading, nor the fact that it sets SSL settings for the whole R session when really it's only an Ensembl problem.

If you swap your useMart() call for useEnsembl() it will run the same test, try to identify if there's a connection problem, and store that in the Mart object. It doesn't do this with useMart() since it's Ensembl specific.

--

You can probably also skip the listMarts() step, because useEnsembl(biomart = "genes") will always connect you to the ensembl genes marts.