kadyb / rgugik

Download datasets from Polish Head Office of Geodesy and Cartography
https://kadyb.github.io/rgugik/
Other
33 stars 4 forks source link

Connection error in tile download #64

Open mikerspencer opened 3 years ago

mikerspencer commented 3 years ago

When running tile_download(req_df, outdir = "./data") from vignette https://cran.r-project.org/web/packages/rgugik/vignettes/DEM.html I get a connection error after ~15 MB.

I'll try again in a few hours, in case the problem is my end.

Submitted as part of JOSS review: https://github.com/openjournals/joss-reviews/issues/2948

mikerspencer commented 3 years ago

Same error is occurring. Of the 33.7 MB file I get 15, 16.7 & 17.8 MB. I think it's too consistent for it to be a problem with my internet connection.

mikerspencer commented 3 years ago

Same connection error when downloading other files, e.g. running geodb_download("wielkopolskie", outdir = "./data") from https://cran.r-project.org/web/packages/rgugik/vignettes/orthophotomap.html

mikerspencer commented 3 years ago

Error also on topodb_download("bieszczadzki", outdir = "./data"). Again at ~15 MB downloaded.

mikerspencer commented 3 years ago

The first 2.4 MB file from test-topodb_download.R downloads, but larger files fail at ~15 MB.

mikerspencer commented 3 years ago

I suggest adding a larger file to the download tests to catch the connection error.

kadyb commented 3 years ago

Sorry for that. I will investigate it. Do you use LAN or WiFi connection?

Hello @adamhsparks, as you are a foreign user, did you also encounter this problem during the review?

adamhsparks commented 3 years ago

Nothing I detected, but I’ll try. My Australian Internet connection should be a true test. 😂

adamhsparks commented 3 years ago

geodb_download("wielkopolskie", outdir = "~/tmp") and topodb_download("bieszczadzki", outdir = "~/tmp") both work fine for me over WiFi with my crappy connection in Perth.

kadyb commented 3 years ago

Thanks for the test! Could you additionally check one large file (375MB) to be sure?

borders_download("administrative units")
adamhsparks commented 3 years ago

ah, here we go.

> # combine data tables
> req_df = rbind(req_df_DTM, req_df_DSM)
> req_df[, 1:5]
            sheetID year              format resolution avgElevErr
30 N-33-130-D-b-1-1 2019 ARC/INFO ASCII GRID      1.0 m        0.1
29 N-33-130-D-b-1-1 2019 ARC/INFO ASCII GRID      0.5 m        0.1
> tile_download(req_df, outdir = "./data")
1/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMT/73044/73044_917579_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 35303843 bytes (33.7 MB)
==================================================
downloaded 33.7 MB

2/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 141586478 bytes (135.0 MB)
===================
downloaded 53.5 MB

[1] "connection error"
adamhsparks commented 3 years ago
> borders_download("administrative units")
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
Content type 'unknown' length 393405215 bytes (375.2 MB)
==
[1] "connection error"
adamhsparks commented 3 years ago

I do frequently have to adjust the timeout in my R session when using download.file(). I'm checking right now using wget with the administrative units. It says it will take ~10 minutes or so. Guessing we're seeing a timeout here in download.file().

mikerspencer commented 3 years ago

Sorry for that. I will investigate it. Do you use LAN or WiFi connection?

I'm on a wired, 1 GB/s, connection to my router. It's not a particularly fast internet connection (~18 MB/s), but it is stable.

adamhsparks commented 3 years ago

No. Not a timeout issue.

> options(timeout=10000)
> utils::download.file(URL, filename, mode = "wb")
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
Content type 'unknown' length 393405215 bytes (375.2 MB)
==================================================
Error in utils::download.file(URL, filename, mode = "wb") : 
  cannot open URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
In addition: Warning message:
In utils::download.file(URL, filename, mode = "wb") :
  URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip': status was 'Failure when receiving data from the peer'
adamhsparks commented 3 years ago

BTW, wget worked fine using zsh

kadyb commented 3 years ago

Thanks for the responses. Here are the logs how it looks for me:

> system.time(borders_download("administrative units"))
trying URL 'ftp://91.223.135.109/prg/jednostki_administracyjne.zip'
downloaded 375.2 MB

   user  system elapsed 
  23.34   90.65  949.96 

> system.time(geodb_download("wielkopolskie"))
trying URL 'http://opendata.geoportal.gov.pl/bdoo/PL.PZGiK.201.30.zip'
Content type 'application/octet-stream' length 18436314 bytes (17.6 MB)
downloaded 17.6 MB

   user  system elapsed 
   1.76    3.04   22.17 

> system.time(topodb_download("bieszczadzki"))
trying URL 'https://opendata.geoportal.gov.pl/bdot10k/18/1801_GML.zip'
Content type 'application/octet-stream' length 23474972 bytes (22.4 MB)
downloaded 22.4 MB

   user  system elapsed 
   2.34    4.69   24.31 

> system.time(tile_download(req_df))
1/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMT/73044/73044_917579_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 35303843 bytes (33.7 MB)
downloaded 33.7 MB

2/2
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
Content type 'application/octet-stream' length 141586478 bytes (135.0 MB)
downloaded 135.0 MB

   user  system elapsed 
   4.03   25.47  134.23 
kadyb commented 3 years ago

There is clearly a problem. Perhaps users from outside Poland have lower connection priority and therefore disconnects occur? All functions in the package are wrappers to R utils::download.file(). I'll check a few more things.

adamhsparks commented 3 years ago

Not sure but it seems an issue in R. Like I said, using wget worked fine for me in the terminal (not RStudio, iTerm2), it downloaded a file that R wouldn’t.

curl::curl_download() may be useful here providing different handle options that utils::download.file() can’t/doesn’t provide?

kadyb commented 3 years ago

I have some results. The problem appears to be exceptional and difficult to reproduce. The following code works for me without any problems. I use Windows 8.1.

test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'

tmp = tempfile()
for (i in seq_len(30)) {
  utils::download.file(test1, tmp, method = "wininet", mode = "wb")
  utils::download.file(test1, tmp, method = "libcurl", mode = "wb")

  utils::download.file(test2, tmp, method = "wininet", mode = "wb")
  utils::download.file(test2, tmp, method = "libcurl", mode = "wb")
}

By moving away from WiFi I can trigger a timeout error:

>   utils::download.file(test1, tmp, method = "libcurl", mode = "wb")
trying URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
 length 141586478 bytes (135.0 MB)
downloaded 118.1 MB

Error in utils::download.file(test1, tmp, method = "libcurl", mode = "wb") : 
  download from 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc' failed
In addition: Warning messages:
1: In utils::download.file(test1, tmp, method = "libcurl", mode = "wb") :
  downloaded length 123876510 != reported length 141586478
2: In utils::download.file(test1, tmp, method = "libcurl", mode = "wb") :
  URL 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc': Timeout of 60 seconds was reached

To fix this, just set options(timeout = 10000). Also, @adamhsparks tried to do this without success, so the actual error is different. From his logs I can see:

'Failure when receiving data from the peer'

kadyb commented 3 years ago

Then I tried the solution suggested by @adamhsparks and the strange thing is that curl::curl_download() fails(?) for me, but curl::curl_fetch_disk() works fine.

curl::curl_download(test1, "test.zip", quiet = FALSE)
#>  [-1073741824%] Downloaded 0 bytes...
curl::curl_download(test2, "test.zip", quiet = FALSE)
#>  [-805306368%] Downloaded 0 bytes...

curl::curl_fetch_disk(test1, "test.zip")
file.size("test.zip")/1024^2
#> 135.0274
curl::curl_fetch_disk(test2, "test.zip")
file.size("test.zip")/1024^2
#> 20.76025

Could you confirm that curl::curl_download() or curl::curl_fetch_disk() works for downloading test1 and test2 files, please?

CC @Nowosad

mikerspencer commented 3 years ago

OK, attempts here:

test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc' test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'

tic = proc.time() curl::curl_download(test1, "test.zip", quiet = FALSE) [100%] Downloaded 141586478 bytes... proc.time() - tic user system elapsed 4.061 2.896 421.931 file.size("test.zip")/1024^2 [1] 135.0274

tic = proc.time() curl::curl_download(test2, "test.zip", quiet = FALSE) [100%] Downloaded 21768701 bytes... proc.time() - tic user system elapsed 1.516 0.963 28.770 file.size("test.zip")/1024^2 [1] 20.76025

tic = proc.time() curl::curl_fetch_disk(test1, "test.zip") $url [1] "https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc"

$times redirect namelookup connect pretransfer starttransfer total 0.000000 0.061644 0.107014 0.239521 0.621672 551.009517

$content [1] "/home/mike/test.zip"

proc.time() - tic user system elapsed 1.182 1.470 551.036 file.size("test.zip")/1024^2 [1] 135.0274

tic = proc.time() curl::curl_fetch_disk(test2, "test.zip") $url [1] "https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip"

$times redirect namelookup connect pretransfer starttransfer total 0.000000 0.075205 0.126705 0.254386 0.330570 28.053150

$content [1] "/home/mike/test.zip"

proc.time() - tic user system elapsed 0.595 0.597 28.094 file.size("test.zip")/1024^2 [1] 20.76025

adamhsparks commented 3 years ago

My attempts match @mikerspencer's. Everything downloads fine in R using either curl function.

Nowosad commented 3 years ago

Both curl functions work well on my (Fedora) machine:

test1 = 'https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc'
test2 = 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip'

curl::curl_download(test1, "test1.zip", quiet = FALSE)
curl::curl_download(test2, "test2.zip", quiet = FALSE)

curl::curl_fetch_disk(test1, "test1.zip")
#> $url
#> [1] "https://opendata.geoportal.gov.pl/NumDaneWys/NMPT/73043/73043_917495_N-33-130-D-b-1-1.asc"
#> 
#> $status_code
#> [1] 200
#> 
#> $type
#> [1] "application/octet-stream"
#> 
#> $headers
#>   [1] 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 54 68
#>  [26] 75 2c 20 32 35 20 46 65 62 20 32 30 32 31 20 30 38 3a 31 36 3a 34 31 20 47
#>  [51] 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 61 63 68 65 2f 32 2e 34 2e 36 20
#>  [76] 28 43 65 6e 74 4f 53 29 20 4f 70 65 6e 53 53 4c 2f 31 2e 30 2e 32 6b 2d 66
#> [101] 69 70 73 20 50 48 50 2f 37 2e 30 2e 33 33 0d 0a 58 2d 50 6f 77 65 72 65 64
#> [126] 2d 42 79 3a 20 50 48 50 2f 37 2e 30 2e 33 33 0d 0a 43 6f 6e 74 65 6e 74 2d
#> [151] 44 65 73 63 72 69 70 74 69 6f 6e 3a 20 46 69 6c 65 20 54 72 61 6e 73 66 65
#> [176] 72 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a 20 61
#> [201] 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 37 33 30 34
#> [226] 33 5f 39 31 37 34 39 35 5f 4e 2d 33 33 2d 31 33 30 2d 44 2d 62 2d 31 2d 31
#> [251] 2e 61 73 63 22 0d 0a 45 78 70 69 72 65 73 3a 20 30 0d 0a 43 61 63 68 65 2d
#> [276] 43 6f 6e 74 72 6f 6c 3a 20 6d 75 73 74 2d 72 65 76 61 6c 69 64 61 74 65 0d
#> [301] 0a 50 72 61 67 6d 61 3a 20 70 75 62 6c 69 63 0d 0a 43 6f 6e 74 65 6e 74 2d
#> [326] 4c 65 6e 67 74 68 3a 20 31 34 31 35 38 36 34 37 38 0d 0a 43 6f 6e 74 65 6e
#> [351] 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63 74 65 74
#> [376] 2d 73 74 72 65 61 6d 0d 0a 0d 0a
#> 
#> $modified
#> [1] NA
#> 
#> $times
#>      redirect    namelookup       connect   pretransfer starttransfer 
#>      0.000000      0.000028      0.000029      0.000090      0.287118 
#>         total 
#>     10.198358 
#> 
#> $content
#> [1] "/tmp/RtmpdsONDi/reprex499ff6388a34a/test1.zip"
file.size("test1.zip")/1024^2
#> [1] 135.0274
curl::curl_fetch_disk(test2, "test2.zip")
#> $url
#> [1] "https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?plik=powiaty/lod1/3064_gml.zip"
#> 
#> $status_code
#> [1] 200
#> 
#> $type
#> [1] "application/octet-stream"
#> 
#> $headers
#>   [1] 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a 44 61 74 65 3a 20 54 68
#>  [26] 75 2c 20 32 35 20 46 65 62 20 32 30 32 31 20 30 38 3a 31 36 3a 35 32 20 47
#>  [51] 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 61 63 68 65 2f 32 2e 34 2e 32 35
#>  [76] 20 28 44 65 62 69 61 6e 29 0d 0a 50 72 61 67 6d 61 3a 20 70 75 62 6c 69 63
#> [101] 0d 0a 45 78 70 69 72 65 73 3a 20 30 0d 0a 43 61 63 68 65 2d 43 6f 6e 74 72
#> [126] 6f 6c 3a 20 70 75 62 6c 69 63 0d 0a 43 6f 6e 74 65 6e 74 2d 44 65 73 63 72
#> [151] 69 70 74 69 6f 6e 3a 20 46 69 6c 65 20 54 72 61 6e 73 66 65 72 0d 0a 43 6f
#> [176] 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a 20 61 74 74 61 63 68
#> [201] 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 22 6c 6f 64 31 2f 33 30 36 34
#> [226] 5f 67 6d 6c 2e 7a 69 70 22 0d 0a 43 6f 6e 74 65 6e 74 2d 54 72 61 6e 73 66
#> [251] 65 72 2d 45 6e 63 6f 64 69 6e 67 3a 20 62 69 6e 61 72 79 0d 0a 43 6f 6e 74
#> [276] 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 32 31 37 36 38 37 30 31 0d 0a 43 6f 6e
#> [301] 74 65 6e 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63
#> [326] 74 65 74 2d 73 74 72 65 61 6d 0d 0a 0d 0a
#> 
#> $modified
#> [1] NA
#> 
#> $times
#>      redirect    namelookup       connect   pretransfer starttransfer 
#>      0.000000      0.001322      0.009389      0.042308      0.072064 
#>         total 
#>      2.308206 
#> 
#> $content
#> [1] "/tmp/RtmpdsONDi/reprex499ff6388a34a/test2.zip"
file.size("test2.zip")/1024^2
#> [1] 20.76025

Created on 2021-02-25 by the reprex package (v1.0.0)

kadyb commented 3 years ago

Thank you all! I will do more tests and eventually prepare the PR.

kadyb commented 3 years ago

Sorry for the many messages, but this problem is hard to reproduce, and I have one more idea.

All functions in the rgugik that download files using utils::download.file() support parameter passing by .... @adamhsparks mentioned that downloading using wget from shell worked fine. So why not try to use method = "wget"? I tested it below, but the most important thing is that it works for you. Other download methods are also available.

If nothing works, I will rewrite it to curl::curl_download() because you already confirmed that it works well.

library("rgugik")

# 17 MB
geodb_download("wielkopolskie", method = "wget", quiet = TRUE)

# 22 MB
topodb_download("bieszczadzki", method = "wget", quiet = TRUE)

# 48 MB
models3D_download("Warszawa", method = "wget", quiet = TRUE)

# 375 MB
borders_download("administrative units", method = "wget", quiet = TRUE)
mikerspencer commented 3 years ago

@kadyb that works fine for me.

adamhsparks commented 3 years ago

Seems to work here for me as well.

kadyb commented 3 years ago

After a long discussion and many tests, we can propose a solution consisting of two steps:

  1. Adding information about the possibility of using a different download method when a connection error occurs (it's already done). We use utils::download.file() function as a backend and it allows downloading using various methods / tools that can be selected by the user. During testing, we concluded that for most users the default method works fine, but there are cases where it fails (probably especially for foreign users). As you mentioned, choosing a different method (e.g. wget) may solve this problem. Thus, we recommend this in https://github.com/kadyb/rgugik/pull/68.

  2. We plan to add curl::curl_download() as an additional independent download tool in the long term (after review). We did some tests and we think it will be a good solution, however we encountered a bug on Windows which is holding us back at the moment. Nevertheless, this solution can also be tested by users in the development version of the package in this way: remotes::install_github("kadyb/rgugik@usecurl").

mikerspencer commented 3 years ago

Can you add the fix (e.g. method = "wget") to the vignettes? It's going to be something that trips up a beginner.

kadyb commented 3 years ago

I've thought about it, but I'm not convinced that setting this as the default for download is a universal solution. During testing, we noticed that different methods are optimal for different configurations. I think the best solution in this situation would be to leave the default method in utils::download.file() (wget isn't installed on Windows by default).

However, the new user needs to be aware that there are different download methods that can be used, so I think adding this information to the vignettes would be a great idea.

kadyb commented 3 years ago

I added such an explanation and example in: https://github.com/kadyb/rgugik/pull/72

ghost commented 2 years ago

Hi there, I am having similar problem both on wire and wi-fi. I have used the package walk-through example to show the error I am getting. Could you please provide some help on how to solve it? Thanks R_error

kadyb commented 2 years ago

@jzphlp, sorry for the problem. You can try to increase the timeout by setting options(timeout = 100000), but I'm not sure if it works with {jsonlite}. In the *_download() functions, you can change the download method, e.g. method = "wininet" or method = "wget". Moreover, here are recommendations to change the DSN: https://github.com/jeroen/curl/issues/72

ghost commented 2 years ago

Thanks for the tip. How about the first case in the pic that I sent? there's no parameter for timeout in DEM_request? Can you explain why is that happening? Any alternative way to get DEM?

kadyb commented 2 years ago

You can change the default timeout in the global R options - type ?options in the console to see the documentation. To change this use options(timeout = 100000) in the console. However, if the links from test1 and test2 don't work in your web browser, I think retrieving the data can be troublesome in all ways.

Can you explain why is that happening?

The servers and internet connection aren't perfect :smile:

Any alternative way to get DEM?

  1. Geoportal: https://mapy.geoportal.gov.pl/imap/Imgp_2.html?gpmap=gp0&locale=en
  2. Web Coverage Services: https://www.geoportal.gov.pl/uslugi/usluga-sieciowa-wcs
  3. QGIS Plugin: https://plugins.qgis.org/plugins/pobieracz_danych_gugik/