rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
235 stars 46 forks source link

"You have no access" error if trying to obtain eurostat data #181

Open snaiperis opened 4 years ago

snaiperis commented 4 years ago

Hello,

I get an error using eurostat package.

> dd <- get_eurostat("namq_10_gdp")
You have no access to ec.europe.eu.
      Please check your connection and/or review your proxy settings

I've tried to look into internals, implementation of function check_access_to_data(). The detail of error is:

> temp <- tempfile()
> http_url <- "http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson"
> download.file(http_url, temp)
trying URL 'http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson'
Error in download.file(http_url, temp) : 
  cannot open URL 'http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson'
In addition: Warning message:
In download.file(http_url, temp) :
  InternetOpenUrl failed: 'A connection with the server could not be established'

wget is able to download this URL after 2 redirects:

>wget http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
--2020-06-01 16:44:09--  http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Resolving ec.europa.eu (ec.europa.eu)... 147.67.210.30, 147.67.34.30
Connecting to ec.europa.eu (ec.europa.eu)|147.67.210.30|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson [following]
--2020-06-01 16:44:09--  https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Connecting to ec.europa.eu (ec.europa.eu)|147.67.210.30|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson [following]
--2020-06-01 16:44:09--  https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Resolving gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)... 40.113.93.170
Connecting to gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)|40.113.93.170|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 141040 (138K)
Saving to: 'NUTS_RG_60M_2006_4326_LEVL_0.geojson'

NUTS_RG_60M_2006_4326_LEVL_0.geojs 100%[================================================================>] 137,73K  --.-KB/s    in 0,1s

2020-06-01 16:44:10 (1024 KB/s) - 'NUTS_RG_60M_2006_4326_LEVL_0.geojson' saved [141040/141040]

Win7 OS, no proxies or other network limitations.

Best regards

antagomir commented 4 years ago

Thanks. I cannot reproduce this. If it is not an institutional limitation in network settings then I am not sure how to solve. Is this a persistent (not temporary) issue?

glilienthal commented 4 years ago

I am seeing the same thing, on my laptop at home (win10) as well as on the remote webserver. It started 2020-06-18 and is persisting.

check_access_to_data() renders FALSE

glilienthal commented 4 years ago

While this is being fixed. Here my workaround:

get files utils.R tidy_eurostat.R source them download tsv.gz, unzip do (if you want to have GDP)

dat <- readr::read_tsv("data/eurostat/nama_10_gdp.tsv", na = ":",  
                     col_types = readr::cols(.default = readr::col_character()))

GDP <-tidy_eurostat(dat)
glilienthal commented 4 years ago

And now: It is working again. After three days of off-time...

check_access_to_data() [1] TRUE

(on both my machines...)

jhuovari commented 4 years ago

This is strage. And meantime you very able to access data "manually"?

DanVal80 commented 4 years ago

Same issue as Snaiperis. i am on Ubuntu 20.04 using R 4.0.2.

> library(eurostat)
> gdp <- eurostat::get_eurostat("namq_10_gdp")
You have no access to ec.europe.eu.
      Please check your connection and/or review your proxy settings

> check_access_to_data()
[1] FALSE

> temp <- tempfile()
> http_url <- "http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson"
> curl::curl_download(http_url, temp)
Error in curl::curl_download(http_url, temp) :
  Timeout was reached: [] Operation timed out after 10001 milliseconds with 0 out of 0 bytes received

> packageVersion("eurostat")
[1] ‘3.6.1’
> packageVersion("curl")
[1] ‘4.3’
>> curl_version()
$version
[1] "7.68.0"

$ssl_version
[1] "GnuTLS/3.6.13"

$libz_version
[1] "1.2.11"

$libssh_version
[1] "libssh/0.9.3/openssl/zlib"

$libidn_version
[1] "2.2.0"

$host
[1] "x86_64-pc-linux-gnu"

$protocols
 [1] "dict"   "file"   "ftp"    "ftps"   "gopher" "http"   "https"  "imap"
 [9] "imaps"  "ldap"   "ldaps"  "pop3"   "pop3s"  "rtmp"   "rtsp"   "scp"
[17] "sftp"   "smb"    "smbs"   "smtp"   "smtps"  "telnet" "tftp"

$ipv6
[1] TRUE

$http2
[1] TRUE

$idn
[1] TRUE

In the command line, wget works:

$ wget http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
--2020-07-31 11:47:48--  http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Resolving ec.europa.eu (ec.europa.eu)... 147.67.34.30, 147.67.210.30, 2a01:7080:14:100::666:30, ...
Connecting to ec.europa.eu (ec.europa.eu)|147.67.34.30|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson [following]
--2020-07-31 11:47:48--  https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Connecting to ec.europa.eu (ec.europa.eu)|147.67.34.30|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson [following]
--2020-07-31 11:47:49--  https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson
Resolving gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)... 40.113.93.170
Connecting to gisco-services.ec.europa.eu (gisco-services.ec.europa.eu)|40.113.93.170|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 141040 (138K)
Saving to: ‘NUTS_RG_60M_2006_4326_LEVL_0.geojson.1’

NUTS_RG_60M_2006_4326_LEVL_0.geojson.1               100%[=====================================================================================================================>] 137.73K  --.-KB/s    in 0.05s

2020-07-31 11:47:49 (2.70 MB/s) - ‘NUTS_RG_60M_2006_4326_LEVL_0.geojson.1’ saved [141040/141040]

curl works (if I set the option -L considering the redirects).

$ curl -L  http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson > test.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   807  100   807    0     0   9170      0 --:--:-- --:--:-- --:--:--  9170
100   309  100   309    0     0   1161      0 --:--:-- --:--:-- --:--:--  3433
100  137k  100  137k    0     0   321k      0 --:--:-- --:--:-- --:--:--  321k
jhuovari commented 4 years ago

Could it be that for some reason download.file uses method = "curl" in these cases? Then extra = "-L" is needed according to documentation for redirections. By default is shouldn't and redirections should work.

Could you test:

tfile <- tempfile()
url <- "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fnamq_10_gdp.tsv.gz"
test <- utils::download.file(url, tfile, method = "libcurl")
test2 <- utils::download.file(url, tfile, method = "curl", extra = "-L")
DanVal80 commented 4 years ago

Ok, that's weird. Both test and test2 are successful

> tfile <- tempfile()
> url <- "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fnamq_10_gdp.tsv.gz"

> test <- utils::download.file(url, tfile, method = "libcurl")
trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fnamq_10_gdp.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 14051440 bytes (13.4 MB)
==================================================
downloaded 13.4 MB

> test2 <- utils::download.file(url, tfile, method = "curl", extra = "-L")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.4M  100 13.4M    0     0  1907k      0  0:00:07  0:00:07 --:--:-- 1961k

but:

> gdp <- eurostat::get_eurostat("namq_10_gdp")
You have no access to ec.europe.eu.
      Please check your connection and/or review your proxy settings

> eurostat::check_access_to_data()
[1] FALSE
jhuovari commented 4 years ago

Thanks for testing. That is then download.file problem. It should use by default wininet or libcurl, but it seems that it uses curl. Have you set download.file.method-option? Could you try what getOption("download.file.method") gives?

jhuovari commented 4 years ago

Could you also test does the following work with the same url?

test <- readr::read_tsv(url, na = ":", col_types = readr::cols(.default = readr::col_character()))

DanVal80 commented 4 years ago

Hi, you are welcome. Here the output to the commands you requested.

> getOption("download.file.method")
NULL
> test <- readr::read_tsv(url, na = ":", col_types = readr::cols(.default = readr::col_character()))
|=================================================================| 100%   47 MB

Note that I am on Linux, so (I guess) the check_access_to_data() performs the download with curl::curl_download() and not with download.file(). Something interesting, though:

> url1 <- 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fnamq_10_gdp.tsv.gz'
> url2 <- 'http://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/geojson/NUTS_RG_60M_2006_4326_LEVL_0.geojson'
# url1 is the file I want. url2 is the file used by eurostat::check_access_to_data()

> curl::curl_download(url1, tfile, quiet = FALSE)
 Downloaded 14095138 bytes...

> curl::curl_download(url2, tfile, quiet = FALSE)
 [100%] Downloaded 807 bytes...
 [100%] Downloaded 309 bytes...
Error in curl::curl_download(url2, tfile, quiet = FALSE, ) :
  Timeout was reached: [] Operation timed out after 10000 milliseconds with 0 out of 0 bytes received

Can this be the problem?

jhuovari commented 4 years ago

So using directly readr works. It seems to be also faster. You can try to installing from a new branch: remotes::install_github("ropengov/eurostat", ref = "speed")

There seems to be also other thinks we could do to speed the package.

fpa2 commented 4 years ago

Hi, first, thanks for your hard work on this... I have been a regular user of the package and in the past 8 months had not problems at all. Today, I was downloading some data (again with no problems).

Then, suddenly I got the same error...

> xtemp <- get_eurostat("sts_inpr_m")
You have no access to ec.europe.eu. 
Please check your connection and/or review your proxy settings

So, I tried the above to see what I get:

> getOption("download.file.method")
[1] "wininet"

url <- "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fnamq_10_gdp.tsv.gz"
test <- readr::read_tsv(url, na = ":", col_types = readr::cols(.default = readr::col_character()))`

also updated as suggested

>remotes::install_github("ropengov/eurostat", ref = "speed")

but still getting the same error...

Any further ideas to test on Windows?

** Btw, I can navigate on Eurostat's website and find the data on the browser.. So, this does not really seem to be a problem with my IP or proxy settings...

Update**: Tried again after 4 hours... it now works..

antagomir commented 4 years ago

Thanks for the update. I do not know what was temporarily out of order, perhaps something at the website? I propose we investigate solutions only is this becomes a more persisent issue.

Does this only work in "speed" branch or could we merge the necessary parts in master, and delete the speed branch @jhuovari ?

jhuovari commented 4 years ago

It is still only in "speed". I was suppose to finalize it and merge, but I haven't have time to do it. I try to do it soon.

monteirojaf commented 1 year ago

Greetings from Basel!

I have started to get the same essor message indicating that I have no access to eurostat data. It is interesting that last year I was able to get data without problems. I am using the following code:

data.table(label_eurostat(get_eurostat("urb_lpop1")), orig=get_eurostat("urb_lpop1")[1], orig=get_eurostat("urb_lpop1")[2], set="urb_lpop1")

Including proxy config is neither working:

data.table(label_eurostat(get_eurostat("urb_lpop1", config = use_proxy(url="http://xxxxxxxxxx",port=3128, username = "xxxxxx", password = "xxxxxx"))), orig=get_eurostat("urb_lpop1")[1], orig=get_eurostat("urb_lpop1")[2], set="urb_lpop1")

Is there any update in this issue?

antagomir commented 1 year ago

@jhuovari any comments / updates

jhuovari commented 1 year ago

Sorry, that is still unfinished. However, I also now behind proxy, and having issues, so I have interest here. Unfortunately also busy...

Meanwhile, you could try to set options(download.file.method = "wininet") or "auto"

umbe1987 commented 1 year ago

I also have this issue from time to time.

For instance, I am having it right now: the eurostat webiste is accessible but trying to download data from R with get_eurostat gives You have no access to ec.europe.eu.

tp <-
  get_eurostat(id = "migr_asytpfq",
               filters = list(geo = eu27,
                              citizen = "UA"))

error

You have no access to ec.europe.eu.
      Please check your connection and/or review your proxy settings

Info on my R configuration:

> R.version
               _                                          
platform       x86_64-pc-linux-gnu                        
arch           x86_64                                     
os             linux-gnu                                  
system         x86_64, linux-gnu                          
status         Patched                                    
major          4                                          
minor          2.2                                        
year           2022                                       
month          11                                         
day            10                                         
svn rev        83330                                      
language       R                                          
version.string R version 4.2.2 Patched (2022-11-10 r83330)
nickname       Innocent and Trusting
pitkant commented 1 year ago

If you have problems (especially behind a proxy connection) could you test httr2-branch of eurostat and tell if it works / doesn't work?

remotes::install_github("ropengov/eurostat", ref = "httr2")

I notice that some queries here have been concerned with bulk download files. To use proxy option you have to use get_eurostat_json() function directly.

I tested httr2 proxy functionalities with public proxies found in https://www.proxynova.com/proxy-server-list/ and the success-rate seemed to be mostly based on the quality of the chosen proxy. With a select few I was able to get things working, but most failed with various types of timeouts. Private proxies might of course be much less burdened and of higher quality than these public proxies...