Open mikerspencer opened 3 years ago
Same error is occurring. Of the 33.7 MB file I get 15, 16.7 & 17.8 MB. I think it's too consistent for it to be a problem with my internet connection.
Same connection error when downloading other files, e.g. running geodb_download("wielkopolskie", outdir = "./data")
from https://cran.r-project.org/web/packages/rgugik/vignettes/orthophotomap.html
Error also on topodb_download("bieszczadzki", outdir = "./data")
. Again at ~15 MB downloaded.
The first 2.4 MB file from test-topodb_download.R
downloads, but larger files fail at ~15 MB.
I suggest adding a larger file to the download tests to catch the connection error
.
Sorry for that. I will investigate it. Do you use LAN or WiFi connection?
Hello @adamhsparks, as you are a foreign user, did you also encounter this problem during the review?
Nothing I detected, but I’ll try. My Australian Internet connection should be a true test. 😂
geodb_download("wielkopolskie", outdir = "~/tmp")
and topodb_download("bieszczadzki", outdir = "~/tmp")
both work fine for me over WiFi with my crappy connection in Perth.
Thanks for the test! Could you additionally check one large file (375MB) to be sure?
borders_download("administrative units")
ah, here we go.
> # combine data tables
> req_df = rbind(req_df_DTM, req_df_DSM)
> req_df[, 1:5]
sheetID year format resolution avgElevErr
30 N-33-130-D-b-1-1 2019 ARC/INFO ASCII GRID 1.0 m 0.1
29 N-33-130-D-b-1-1 2019 ARC/INFO ASCII GRID 0.5 m 0.1
> tile_download(req_df, outdir = "./data")
1/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMT/73044/73044_917579_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 35303843 bytes (33.7 MB)
==================================================
downloaded 33.7 MB
2/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 141586478 bytes (135.0 MB)
===================
downloaded 53.5 MB
[1] "connection error"
> borders_download("administrative units")
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
Content type 'unknown' length 393405215 bytes (375.2 MB)
==
[1] "connection error"
I do frequently have to adjust the timeout in my R session when using download.file()
. I'm checking right now using wget with the administrative units. It says it will take ~10 minutes or so. Guessing we're seeing a timeout here in download.file()
.
Sorry for that. I will investigate it. Do you use LAN or WiFi connection?
I'm on a wired, 1 GB/s, connection to my router. It's not a particularly fast internet connection (~18 MB/s), but it is stable.
No. Not a timeout issue.
> options(timeout=10000)
> utils::download.file(URL, filename, mode = "wb")
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
Content type 'unknown' length 393405215 bytes (375.2 MB)
==================================================
Error in utils::download.file(URL, filename, mode = "wb") :
cannot open URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
In addition: Warning message:
In utils::download.file(URL, filename, mode = "wb") :
URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip': status was 'Failure when receiving data from the peer'
BTW, wget worked fine using zsh
Thanks for the responses. Here are the logs how it looks for me:
> system.time(borders_download("administrative units"))
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
downloaded 375.2 MB
user system elapsed
23.34 90.65 949.96
> system.time(geodb_download("wielkopolskie"))
trying URL 'http://opendata.geoportal.gov.pl/bdoo/PL.PZGiK.201.30.zip'
Content type 'application/octet-stream' length 18436314 bytes (17.6 MB)
downloaded 17.6 MB
user system elapsed
1.76 3.04 22.17
> system.time(topodb_download("bieszczadzki"))
trying URL 'https://opendata.geoportal.gov.pl/bdot10k/18/1801_GML.zip'
Content type 'application/octet-stream' length 23474972 bytes (22.4 MB)
downloaded 22.4 MB
user system elapsed
2.34 4.69 24.31
> system.time(tile_download(req_df))
1/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMT/73044/73044_917579_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 35303843 bytes (33.7 MB)
downloaded 33.7 MB
2/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 141586478 bytes (135.0 MB)
downloaded 135.0 MB
user system elapsed
4.03 25.47 134.23
There is clearly a problem. Perhaps users from outside Poland have lower connection priority and therefore disconnects occur?
All functions in the package are wrappers to R utils::download.file()
.
I'll check a few more things.
Not sure but it seems an issue in R. Like I said, using wget worked fine for me in the terminal (not RStudio, iTerm2), it downloaded a file that R wouldn’t.
curl::curl_download()
may be useful here providing different handle options that utils::download.file()
can’t/doesn’t provide?
I have some results. The problem appears to be exceptional and difficult to reproduce. The following code works for me without any problems. I use Windows 8.1.
test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'
tmp = tempfile()
for (i in seq_len(30)) {
utils::download.file(test1, tmp, method = "wininet", mode = "wb")
utils::download.file(test1, tmp, method = "libcurl", mode = "wb")
utils::download.file(test2, tmp, method = "wininet", mode = "wb")
utils::download.file(test2, tmp, method = "libcurl", mode = "wb")
}
By moving away from WiFi I can trigger a timeout error:
> utils::download.file(test1, tmp, method = "libcurl", mode = "wb")
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
length 141586478 bytes (135.0 MB)
downloaded 118.1 MB
Error in utils::download.file(test1, tmp, method = "libcurl", mode = "wb") :
download from 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc' failed
In addition: Warning messages:
1: In utils::download.file(test1, tmp, method = "libcurl", mode = "wb") :
downloaded length 123876510 != reported length 141586478
2: In utils::download.file(test1, tmp, method = "libcurl", mode = "wb") :
URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc': Timeout of 60 seconds was reached
To fix this, just set options(timeout = 10000)
. Also, @adamhsparks tried to do this without success, so the actual error is different. From his logs I can see:
'Failure when receiving data from the peer'
Then I tried the solution suggested by @adamhsparks and the strange thing is that curl::curl_download()
fails(?) for me, but curl::curl_fetch_disk()
works fine.
curl::curl_download(test1, "test.zip", quiet = FALSE)
#> [-1073741824%] Downloaded 0 bytes...
curl::curl_download(test2, "test.zip", quiet = FALSE)
#> [-805306368%] Downloaded 0 bytes...
curl::curl_fetch_disk(test1, "test.zip")
file.size("test.zip")/1024^2
#> 135.0274
curl::curl_fetch_disk(test2, "test.zip")
file.size("test.zip")/1024^2
#> 20.76025
Could you confirm that curl::curl_download()
or curl::curl_fetch_disk()
works for downloading test1
and test2
files, please?
CC @Nowosad
OK, attempts here:
test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc' test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'
tic = proc.time() curl::curl_download(test1, "test.zip", quiet = FALSE) [100%] Downloaded 141586478 bytes... proc.time() - tic user system elapsed 4.061 2.896 421.931 file.size("test.zip")/1024^2 [1] 135.0274
tic = proc.time() curl::curl_download(test2, "test.zip", quiet = FALSE) [100%] Downloaded 21768701 bytes... proc.time() - tic user system elapsed 1.516 0.963 28.770 file.size("test.zip")/1024^2 [1] 20.76025
tic = proc.time() curl::curl_fetch_disk(test1, "test.zip") $url [1] "https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc"
$times redirect namelookup connect pretransfer starttransfer total 0.000000 0.061644 0.107014 0.239521 0.621672 551.009517
$content [1] "/home/mike/test.zip"
proc.time() - tic user system elapsed 1.182 1.470 551.036 file.size("test.zip")/1024^2 [1] 135.0274
tic = proc.time() curl::curl_fetch_disk(test2, "test.zip") $url [1] "https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip"
$times redirect namelookup connect pretransfer starttransfer total 0.000000 0.075205 0.126705 0.254386 0.330570 28.053150
$content [1] "/home/mike/test.zip"
proc.time() - tic user system elapsed 0.595 0.597 28.094 file.size("test.zip")/1024^2 [1] 20.76025
My attempts match @mikerspencer's. Everything downloads fine in R using either curl
function.
Both curl functions work well on my (Fedora) machine:
test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'
curl::curl_download(test1, "test1.zip", quiet = FALSE)
curl::curl_download(test2, "test2.zip", quiet = FALSE)
curl::curl_fetch_disk(test1, "test1.zip")
#> $url
#> [1] "https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc"
#>
#> $status_code
#> [1] 200
#>
#> $type
#> [1] "application/octet-stream"
#>
#> $headers
#> [1] 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 54 68
#> [26] 75 2c 20 32 35 20 46 65 62 20 32 30 32 31 20 30 38 3a 31 36 3a 34 31 20 47
#> [51] 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 61 63 68 65 2f 32 2e 34 2e 36 20
#> [76] 28 43 65 6e 74 4f 53 29 20 4f 70 65 6e 53 53 4c 2f 31 2e 30 2e 32 6b 2d 66
#> [101] 69 70 73 20 50 48 50 2f 37 2e 30 2e 33 33 0d 0a 58 2d 50 6f 77 65 72 65 64
#> [126] 2d 42 79 3a 20 50 48 50 2f 37 2e 30 2e 33 33 0d 0a 43 6f 6e 74 65 6e 74 2d
#> [151] 44 65 73 63 72 69 70 74 69 6f 6e 3a 20 46 69 6c 65 20 54 72 61 6e 73 66 65
#> [176] 72 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a 20 61
#> [201] 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 37 33 30 34
#> [226] 33 5f 39 31 37 34 39 35 5f 4e 2d 33 33 2d 31 33 30 2d 44 2d 62 2d 31 2d 31
#> [251] 2e 61 73 63 22 0d 0a 45 78 70 69 72 65 73 3a 20 30 0d 0a 43 61 63 68 65 2d
#> [276] 43 6f 6e 74 72 6f 6c 3a 20 6d 75 73 74 2d 72 65 76 61 6c 69 64 61 74 65 0d
#> [301] 0a 50 72 61 67 6d 61 3a 20 70 75 62 6c 69 63 0d 0a 43 6f 6e 74 65 6e 74 2d
#> [326] 4c 65 6e 67 74 68 3a 20 31 34 31 35 38 36 34 37 38 0d 0a 43 6f 6e 74 65 6e
#> [351] 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63 74 65 74
#> [376] 2d 73 74 72 65 61 6d 0d 0a 0d 0a
#>
#> $modified
#> [1] NA
#>
#> $times
#> redirect namelookup connect pretransfer starttransfer
#> 0.000000 0.000028 0.000029 0.000090 0.287118
#> total
#> 10.198358
#>
#> $content
#> [1] "/tmp/RtmpdsONDi/reprex499ff6388a34a/test1.zip"
file.size("test1.zip")/1024^2
#> [1] 135.0274
curl::curl_fetch_disk(test2, "test2.zip")
#> $url
#> [1] "https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip"
#>
#> $status_code
#> [1] 200
#>
#> $type
#> [1] "application/octet-stream"
#>
#> $headers
#> [1] 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 54 68
#> [26] 75 2c 20 32 35 20 46 65 62 20 32 30 32 31 20 30 38 3a 31 36 3a 35 32 20 47
#> [51] 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 61 63 68 65 2f 32 2e 34 2e 32 35
#> [76] 20 28 44 65 62 69 61 6e 29 0d 0a 50 72 61 67 6d 61 3a 20 70 75 62 6c 69 63
#> [101] 0d 0a 45 78 70 69 72 65 73 3a 20 30 0d 0a 43 61 63 68 65 2d 43 6f 6e 74 72
#> [126] 6f 6c 3a 20 70 75 62 6c 69 63 0d 0a 43 6f 6e 74 65 6e 74 2d 44 65 73 63 72
#> [151] 69 70 74 69 6f 6e 3a 20 46 69 6c 65 20 54 72 61 6e 73 66 65 72 0d 0a 43 6f
#> [176] 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a 20 61 74 74 61 63 68
#> [201] 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 6c 6f 64 31 2f 33 30 36 34
#> [226] 5f 67 6d 6c 2e 7a 69 70 22 0d 0a 43 6f 6e 74 65 6e 74 2d 54 72 61 6e 73 66
#> [251] 65 72 2d 45 6e 63 6f 64 69 6e 67 3a 20 62 69 6e 61 72 79 0d 0a 43 6f 6e 74
#> [276] 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 32 31 37 36 38 37 30 31 0d 0a 43 6f 6e
#> [301] 74 65 6e 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63
#> [326] 74 65 74 2d 73 74 72 65 61 6d 0d 0a 0d 0a
#>
#> $modified
#> [1] NA
#>
#> $times
#> redirect namelookup connect pretransfer starttransfer
#> 0.000000 0.001322 0.009389 0.042308 0.072064
#> total
#> 2.308206
#>
#> $content
#> [1] "/tmp/RtmpdsONDi/reprex499ff6388a34a/test2.zip"
file.size("test2.zip")/1024^2
#> [1] 20.76025
Created on 2021-02-25 by the reprex package (v1.0.0)
Thank you all! I will do more tests and eventually prepare the PR.
Sorry for the many messages, but this problem is hard to reproduce, and I have one more idea.
All functions in the rgugik that download files using utils::download.file()
support parameter passing by ...
. @adamhsparks mentioned that downloading using wget from shell worked fine. So why not try to use method = "wget"
? I tested it below, but the most important thing is that it works for you. Other download methods are also available.
If nothing works, I will rewrite it to curl::curl_download()
because you already confirmed that it works well.
library("rgugik")
# 17 MB
geodb_download("wielkopolskie", method = "wget", quiet = TRUE)
# 22 MB
topodb_download("bieszczadzki", method = "wget", quiet = TRUE)
# 48 MB
models3D_download("Warszawa", method = "wget", quiet = TRUE)
# 375 MB
borders_download("administrative units", method = "wget", quiet = TRUE)
@kadyb that works fine for me.
Seems to work here for me as well.
After a long discussion and many tests, we can propose a solution consisting of two steps:
Adding information about the possibility of using a different download method when a connection error occurs (it's already done). We use utils::download.file()
function as a backend and it allows downloading using various methods / tools that can be selected by the user. During testing, we concluded that for most users the default method works fine, but there are cases where it fails (probably especially for foreign users). As you mentioned, choosing a different method (e.g. wget
) may solve this problem. Thus, we recommend this in https://github.com/kadyb/rgugik/pull/68.
We plan to add curl::curl_download()
as an additional independent download tool in the long term (after review). We did some tests and we think it will be a good solution, however we encountered a bug on Windows which is holding us back at the moment. Nevertheless, this solution can also be tested by users in the development version of the package in this way: remotes::install_github("kadyb/rgugik@usecurl")
.
Can you add the fix (e.g. method = "wget"
) to the vignettes? It's going to be something that trips up a beginner.
I've thought about it, but I'm not convinced that setting this as the default for download is a universal solution. During testing, we noticed that different methods are optimal for different configurations. I think the best solution in this situation would be to leave the default method in utils::download.file()
(wget
isn't installed on Windows by default).
However, the new user needs to be aware that there are different download methods that can be used, so I think adding this information to the vignettes would be a great idea.
I added such an explanation and example in: https://github.com/kadyb/rgugik/pull/72
Hi there, I am having similar problem both on wire and wi-fi. I have used the package walk-through example to show the error I am getting. Could you please provide some help on how to solve it? Thanks
@jzphlp, sorry for the problem. You can try to increase the timeout by setting options(timeout = 100000)
, but I'm not sure if it works with {jsonlite}
. In the *_download()
functions, you can change the download method, e.g. method = "wininet"
or method = "wget"
. Moreover, here are recommendations to change the DSN: https://github.com/jeroen/curl/issues/72
Thanks for the tip. How about the first case in the pic that I sent? there's no parameter for timeout in DEM_request? Can you explain why is that happening? Any alternative way to get DEM?
You can change the default timeout in the global R options - type ?options
in the console to see the documentation. To change this use options(timeout = 100000)
in the console. However, if the links from test1 and test2 don't work in your web browser, I think retrieving the data can be troublesome in all ways.
Can you explain why is that happening?
The servers and internet connection aren't perfect :smile:
Any alternative way to get DEM?
When running
tile_download(req_df, outdir = "./data")
from vignette https://cran.r-project.org/web/packages/rgugik/vignettes/DEM.html I get a connection error after ~15 MB.I'll try again in a few hours, in case the problem is my end.
Submitted as part of JOSS review: https://github.com/openjournals/joss-reviews/issues/2948