jeroen / curl

A Modern and Flexible Web Client for R
https://jeroen.r-universe.dev/curl
Other
219 stars 70 forks source link

'curl_download' doesn't exit after connection is broken #238

Closed kadyb closed 2 weeks ago

kadyb commented 3 years ago

When the connection is broken (or broken and restored after ~1-2 minutes), the curl::curl_download() function doesn't terminate.

I noted this on Windows 8.1. In my case, I tried to download a ~10 MB file, the connection was broken and nothing happened for another 35 minutes (even though the Internet was restored).

# 9.8 MB
url = "https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?d=2&plik=powiaty/lod1/3019_gml.zip"
curl::curl_download(url, "test.zip", quiet = FALSE)
#>  [0%] Downloaded 0 bytes...
# <I clicked 'Interrupt R' after 35 min>
#> Error in curl::curl_download(url, "test.zip", quiet = FALSE) :
#> Operation was aborted by an application callback

curl

In the case of the utils::download.file() function, after the connection is broken, the function terminates after some time and returns an error.

utils::download.file(url, "test1.zip", method = "wininet")
#> trying URL 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?d=2&plik=powiaty/lod1/3019_gml.zip'
#> Content type 'application/octet-stream' length 10275903 bytes (9.8 MB)
#> downloaded 2.3 MB

#> Warning message:
#>   In utils::download.file(url, "test1.zip", method = "wininet") :
#>   downloaded length 2441216 != reported length 10275903

utils::download.file(url, "test1.zip", method = "libcurl")
#> trying URL 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?d=2&plik=powiaty/lod1/3019_gml.zip'
#> length 10275903 bytes (9.8 MB)
#> downloaded 1.0 MB

#> Error in utils::download.file(url, "test1.zip", method = "libcurl") :
#>   download from 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?d=2&plik=powiaty/lod1/3019_gml.zip' failed
#> In addition: Warning messages:
#> 1: In utils::download.file(url, "test1.zip", method = "libcurl") :
#>   downloaded length 1064960 != reported length 10275903
#> 2: In utils::download.file(url, "test1.zip", method = "libcurl") :
#>   URL 'https://integracja.gugik.gov.pl/Budynki3D/pobierz.php?d=2&plik=powiaty/lod1/3019_gml.zip': Timeout of 60 seconds was reached
kadyb commented 3 years ago

Can I somehow change the default curl::curl_download() settings to have the same behavior as in utils::download.file() on Windows?

Here are the package versions:

> packageVersion("curl")
[1] ‘4.3’

> curl::curl_version()
$version
[1] "7.64.1"

$ssl_version
[1] "(OpenSSL/1.1.1a) Schannel"

$libz_version
[1] "1.2.11"

$libssh_version
[1] "libssh2/1.8.2"

$libidn_version
[1] NA

$host
[1] "x86_64-w64-mingw32"

$protocols
 [1] "dict"   "file"   "ftp"    "ftps"   "gopher" "http"   "https"  "imap"  
 [9] "imaps"  "ldap"   "ldaps"  "pop3"   "pop3s"  "rtsp"   "scp"    "sftp"  
[17] "smtp"   "smtps"  "telnet" "tftp"  

$ipv6
[1] TRUE

$http2
[1] FALSE

$idn
[1] TRUE
jeroen commented 2 weeks ago

It is impossible to know if a connection is "broken" or if we just need to wait for the server. You can use the CURLOPT_LOW_SPEED_TIME and CURLOPT_LOW_SPEED_LIMIT options to kill slow or stalled downloads.