Closed cbizzo closed 4 years ago
Thanks, for bringing this up. Do you think it is still the case after this fix jeroen/curl@6fc0305 in curl::has_internet()
If the fix is has_internet <- function(){ !is.null(nslookup("google.com", error = FALSE)) } then it works fine. When and where will this fix be avaiable?
Hi I was away from the corporate proxy so couldn't text these over the weekend. Unfortunately both the update to curl and the original leave the same outcome: FALSE
. I have time to test some other approaches today. If I think of something I will inform you.
OK, I've done some digging. If you like I can submit a pull request with the changes. I describe the process below.
First, after looking deeper into the issue, curl_fetch_memory
also does not work inside my work, nor do most curl
functions. That means eurostat::get_eurostat_json()
and similar functions which make such a call would not work for me also before this change.
My usage of eurostat
is typically of the form eurostat::get_eurostat(id, filters)
. Looking under the hood, I've realised that this in turn will call get_eurostat_raw()
which relies on download.file()
.
On Windows, unless otherwise configured download.file()
will use wininet.dll
to download a file over http and https. This is the similar to using Internet Explorer to download a file.
From what I can tell, that my workplace allows Internet Explorer to access the internet, but does not allow RStudio to, without recourse to proxy information.
My solution is to remove the curl::has_internet()
call from get_eurostat()
, but to leave it in any function that directly uses curl. This way when get_eurostat_raw()
is called, which does not currently have a has_internet()
call inside the function will proceed to download.
If a function is needed to check if the internet is available via wininet
(or more specifically, download.file()
you might consider pingr::is_online()
or just to use download.file
on something you always expect to be up, e.g. "http://captive.apple.com/hotspot-detect.html"
. Like so:
# heavily influenced by https://github.com/r-lib/pingr/blob/master/R/http.R
has_internet_download.file <- function(){
temp <- tempfile()
suppressWarnings(try(download.file("http://captive.apple.com/html", temp, quiet= TRUE), silent = TRUE))
if (is.na(file.info(temp)$size)){
FALSE
}
else{
data<- readChar(temp, file.info(temp)$size)
grepl("Success", readChar(temp, file.info(temp)$size), data)
}
}
Doing some testing now. If @muuankarski has any comments that might help.
Ok, had a look at this using curl
version 4.3
in Windows behind a proxy. As pointed out by @cbizzo eurostat
uses download.file()
for data downloads that do work in Windows behind a proxy with no need for http_proxy
/https_proxy
- environment variables. Therefore eurostat
has worked for people using it in such settings.
However, we need to comply with CRAN policy...
Packages which use Internet resources should fail gracefully with an informative message if the resource is not available (and not give a check warning nor error)
...and need a way to prevent error when resource is not available,
I was not able to produce FALSE
with curl::has_internet()
when proxy-variables were set empty. curl::curl_download()
only works with proper proxy-settings. (In linux, behind a proxy, you need to define the proxy-settings even with download.file()
.) This is how it was checked.
# EU country level geojson
url <- "https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson"
tmpfile <- tempfile()
# Empty proxy settings
Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")
curl::has_internet()
# [1] TRUE
download.file(url, tmpfile, mode = "wb")
# Content type 'text/plain' length 141040 bytes (137 KB)
# downloaded 137 KB
curl::curl_download(url, tmpfile)
# Error in curl::curl_download(url, tmpfile) :
# Failed to connect to ec.europa.eu port 443: Timed out
# Corporate proxy settings
Sys.setenv(http_proxy = "put your proxy settings here")
Sys.setenv(https_proxy = "put your proxy settings here")
curl::has_internet()
# [1] TRUE
download.file(url, tmpfile, mode = "wb")
# Content type 'text/plain' length 141040 bytes (137 KB)
# downloaded 137 KB
curl::curl_download(url, tmpfile)
# success
Although I can't reproduce the behaviour, I agree that we should find an replacement for has_internet()
. However, there are users with offline environments that can only access to few url's in internet, http://captive.apple.com
not being one of them. For them it would always produce an error although they do have access to ec.europa.eu
. Therefore, would something like this do the trick?
has_internet_download.file <- function(){
temp <- tempfile()
suppressWarnings(try(download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson", temp, quiet= TRUE), silent = TRUE))
if (is.na(file.info(temp)$size)){
FALSE
}
else{
# data<- readChar(con = temp, nchars = file.info(temp)$size)
# grepl("Success", readChar(temp, file.info(temp)$size), data)
TRUE
}
}
Any ideas?
My impression is that this would do the trick, and would be ready to proceed with it. Eagerly looking fwd to comments from @cbizzo and @JensXII
I am a ignorant on network setup. Hence, I cannot comment on, if I think it will work. I have two computers. One where curl::has_internet() TRUE and one with FALSE. I have looked at the system evironments Sys.getenv() on both of them. None of them have anything with http_proxy or anything with proxy. They are actually alike. On the other hand, having these two computers, I might be able to test. But you have to tell me exactly what to do.
download.file()
hangs the system in unix if there is no access to resource. I implemented a slightly improved solution in branch https://github.com/rOpenGov/eurostat/tree/feature/check_internet
@JensXII perhaps you can run the lines below and report if it works for you?
remotes::install_github("ropengov/eurostat", ref = "feature/check_internet")
library(eurostat)
packageVersion("eurostat") # Should be ‘3.4.20006’
# Set proxies empty
Sys.setenv(http_proxy = "")
Sys.setenv(https_proxy = "")
# check they really are empty
Sys.getenv("http_proxy")
Sys.getenv("https_proxy")
eurostat::clean_eurostat_cache()
eurostat::search_eurostat(pattern = "Fertility")
eurostat::get_eurostat(id = "demo_r_frate2")
In unix, behind a proxy with no proxy defined, this should gracefully fail and produce a message
You have no access to ec.europe.eu.
Please check your connection and/or review your proxy settings
Hi I finally got around to looking. This fix fixes behind the firewall for our uses. Thanks a lot for taking care.
If it's pulled into master we can close the issue. Thanks.
I have done the testing suggested by muuankarski, but changed to get_eurostat to what caused me problem in the first place. As you can see below and cbizzo already has concluded - it work.
Hence, I agree, get the cran-package updated, and close the issue.
Thank you, for the quick solving of the problem :-)
curl::has_internet() [1] FALSE
remotes::install_github("ropengov/eurostat", ref = "feature/check_internet") Skipping install of 'eurostat' from a github remote, the SHA1 (06c1fc09) has not changed since last install. Use
force = TRUE
to force installation library(eurostat) packageVersion("eurostat") # Should be ‘3.4.20006’ [1] ‘3.4.20006’Set proxies empty
Sys.setenv(http_proxy = "") Sys.setenv(https_proxy = "")
check they really are empty
Sys.getenv("http_proxy") [1] "" Sys.getenv("https_proxy") [1] ""
eurostat::clean_eurostat_cache() Deleted .rds files from C:\Users\NLS\AppData\Local\Temp\4\RtmpA3HIko/eurostat eurostat::search_eurostat(pattern = "Fertility")
A tibble: 10 x 8
title code type
last update of data
last table structure change
data start
data end
values1 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA 2 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA 3 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA 4 Fertility and mortality - cities and greater cities urb_cfermor dataset 21.01.2020 21.01.2020 1990 2019 NA 5 Fertility and mortality - functional urban areas urb_lfermor dataset 21.01.2020 21.01.2020 1990 2019 NA 6 Fertility indicators demo_find dataset 06.11.2019 15.08.2019 1960 2017 NA 7 Fertility rates by age demo_frate dataset 06.11.2019 06.11.2019 1960 2017 NA 8 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA 9 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA 10 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA # eurostat::get_eurostat(id = "demo_r_frate2") pop_data <- subset(eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num"), + (age == "TOTAL") & (sex == "T") & (nchar(trimws(geo)) == 5))[, c("time","geo","values")] trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_pjangrp3.tsv.gz' Content type 'application/octet-stream;charset=UTF-8' length 2294016 bytes (2.2 MB) downloaded 2.2 MB
Table demo_r_pjangrp3 cached at C:\Users\NLS\AppData\Local\Temp\4\RtmpA3HIko/eurostat/demo_r_pjangrp3_num_code_TF.rds
curl::has_internet() [1] TRUE remotes::install_github("ropengov/eurostat", ref = "feature/check_internet") Downloading GitHub repo ropengov/eurostat@feature/check_internet These packages have more recent versions available. Which would you like to update?
1: All
2: CRAN packages only
3: None
4: tidyr (0.8.3 -> 1.0.0) [CRAN]
5: hms (0.5.0 -> 0.5.3) [CRAN]
Enter one or more numbers, or an empty line to skip updates: library(eurostat) Enter one or more numbers, or an empty line to skip updates: packageVersion("eurostat") # Should be ‘3.4.20006’ Enter one or more numbers, or an empty line to skip updates: 2 tidyr (0.8.3 -> 1.0.0) [CRAN] hms (0.5.0 -> 0.5.3) [CRAN] Installing 2 packages: tidyr, hms trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/tidyr_1.0.0.zip' Content type 'application/zip' length 1293390 bytes (1.2 MB) downloaded 1.2 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/hms_0.5.3.zip' Content type 'application/zip' length 107433 bytes (104 KB) downloaded 104 KB
package ‘tidyr’ successfully unpacked and MD5 sums checked package ‘hms’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF\downloaded_packages √ checking for file 'C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF\remotes1bd47e63168d\rOpenGov-eurostat-06c1fc0/DESCRIPTION' (1.9s)
preparing 'eurostat': (1.9s) √ checking DESCRIPTION meta-information ...
checking for LF line-endings in source and make files and shell scripts (855ms)
checking for empty or unneeded directories
looking to see if a 'data/datalist' file should be added
building 'eurostat_3.4.20006.tar.gz' (422ms)
installing source package 'eurostat' ...
R
data
* moving datasets to lazyload DB
inst
byte-compile and prepare package for lazy loading
* help
installing help indices
converting help for package 'eurostat'
finding HTML links ... done
check_access_to_data html
clean_eurostat_cache html
convert_time_col html
cut_to_classes html
dic_order html
eu_countries html
eurostat-package html
eurostat_geodata_60_2016 html
eurotime2date html
eurotime2num html
get_bibentry html
get_eurostat html
get_eurostat_dic html
get_eurostat_geospatial html
get_eurostat_json html
get_eurostat_raw html
get_eurostat_toc html
harmonize_country_code html
label_eurostat html
search_eurostat html
set_eurostat_toc html
tgs00026 html
tidy_eurostat html
building package indices
installing vignettes
testing if installed package can be loaded
** arch - i386
arch - x64
DONE (eurostat) In R CMD INSTALL Warning messages: 1: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers 2: In untar2(tarfile, files, list, exdir) : skipping pax global extended headers
Set proxies empty
Sys.setenv(http_proxy = "") Sys.setenv(https_proxy = "")
check they really are empty
Sys.getenv("http_proxy") [1] "" Sys.getenv("https_proxy") [1] ""
eurostat::clean_eurostat_cache() The cache does not exist eurostat::search_eurostat(pattern = "Fertility")
A tibble: 10 x 8
title code type
last update of dat~
last table structure chang~data start
data end
values1 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA 2 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA 3 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA 4 Fertility and mortality - cities and greater ci~ urb_cfermor dataset 21.01.2020 21.01.2020 1990 2019 NA 5 Fertility and mortality - functional urban areas urb_lfermor dataset 21.01.2020 21.01.2020 1990 2019 NA 6 Fertility indicators demo_find dataset 06.11.2019 15.08.2019 1960 2017 NA 7 Fertility rates by age demo_frate dataset 06.11.2019 06.11.2019 1960 2017 NA 8 Fertility rates by age and NUTS 2 region demo_r_frate2 dataset 13.03.2019 10.05.2019 1990 2017 NA 9 Fertility indicators by NUTS 2 region demo_r_find2 dataset 15.03.2019 10.05.2019 1990 2017 NA 10 Fertility indicators by NUTS 3 region demo_r_find3 dataset 17.10.2019 28.02.2019 2013 2017 NA eurostat::get_eurostat(id = "demo_r_frate2") trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fdemo_r_frate2.tsv.gz' Content type 'application/octet-stream;charset=UTF-8' length 907647 bytes (886 KB) downloaded 886 KB
Table demo_r_frate2 cached at C:\Users\nls\AppData\Local\Temp\RtmpcbV1oF/eurostat/demo_r_frate2_date_code_TF.rds
unit age geo time values
@muuankarski when you are ready, can you merge to devel branch; I can take care of the rest of testing, CRAN, and master then.
merged in to devel
Fixed in master now & on the way to CRAN.
I left a comment on the commit also: https://github.com/rOpenGov/eurostat/commit/19a22f3883baccdf78f4507be3ea5f5e8dbb2ec4#commitcomment-36824920
Perhaps first check if has_internet and if not, if http_proxy is defined?