r-lib / httr

httr: a friendly http package for R
https://httr.r-lib.org
Other
986 stars 1.99k forks source link

R/httr and R/curl much slower than download.file #704

Closed matmu closed 10 months ago

matmu commented 2 years ago

When querying an endpoint of a REST API I set up with R/plumber, I experience substantial time differences between R/httr, R/curl and download.file. While download.file takes around 8 seconds, httr and curl take 25 seconds or more for exactly the same query using the same client and host server (both CentOS 7). With wget and curl on command line it also takes 8 seconds.

Querying the API

system.time({
  r = GET("http://myserver.com/path/to/endpoint")
  warn_for_status(r)
  stop_for_status(r)
  x = content(r)
})
# user  system elapsed 
# 9.329   0.345  32.946 

system.time({
  req = curl_fetch_memory("http://myserver.com/path/to/endpoint")
})
# user  system elapsed 
# 1.707   0.268  25.160 

system.time({
  curl_download("http://myserver.com/path/to/endpoint",  tempfile())
})
# user  system elapsed 
# 1.608   0.417  25.237

system.time({
  curl_fetch_disk("http://myserver.com/path/to/endpoint", tempfile())
})
# user  system elapsed 
# 1.656   0.467  25.468

system.time({
  download.file("http://myserver.com/path/to/endpoint", tempfile())
})
# user  system elapsed 
# 0.980   0.469   8.260 

Response header

> headers(r)
$server
[1] "nginx/1.21.1"
$date
[1] "Wed, 29 Sep 2021 14:51:36 GMT"
$`content-type`
[1] "text/tab-separated-values; charset=UTF-8"
$`transfer-encoding`
[1] "chunked"
$connection
[1] "keep-alive"
$`content-encoding`
[1] "gzip"
attr(,"class")
[1] "insensitive" "list"

sessionInfo()

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] curl_4.3.2 httr_1.4.2

loaded via a namespace (and not attached):
 [1] webutils_1.1      tidyselect_1.1.1  remotes_2.4.0     purrr_0.3.4       lattice_0.20-44   swagger_3.33.1    vctrs_0.3.8       generics_0.1.0   
 [9] testthat_3.0.2    usethis_2.0.1     SnowballC_0.7.0   tidytext_0.3.1    utf8_1.2.2        blob_1.2.2        rlang_0.4.11      pkgbuild_1.2.0   
[17] pillar_1.6.2      later_1.2.0       glue_1.4.2        withr_2.4.2       DBI_1.1.1         bit64_4.0.5       sessioninfo_1.1.1 lifecycle_1.0.0  
[25] stringr_1.4.0     devtools_2.4.2    memoise_2.0.0     callr_3.7.0       fastmap_1.1.0     ps_1.6.0          fansi_0.5.0       tokenizers_0.2.1 
[33] Rcpp_1.0.7        promises_1.2.0.1  cachem_1.0.5      desc_1.3.0        pkgload_1.2.1     jsonlite_1.7.2    fs_1.5.0          bit_4.0.4        
[41] stringi_1.6.2     processx_3.5.2    dplyr_1.0.7       plumber_1.1.0     grid_4.0.1        rprojroot_2.0.2   cli_3.0.1         tools_4.0.1      
[49] magrittr_2.0.1    RSQLite_2.2.7     tibble_3.1.3      janeaustenr_0.1.5 crayon_1.4.1      pkgconfig_2.0.3   ellipsis_0.3.2    Matrix_1.3-4     
[57] data.table_1.14.0 prettyunits_1.1.1 assertthat_0.2.1  R6_2.5.0          compiler_4.0.1   

libcurl version libcurl 7.29.0 with NSS/3.53.1

Update 04-10-2021

There seems to be a discrepancy between result of headers() and what the actual header is:

> library(httr)
> system.time({
+   r = GET("http://127.0.0.1:4000/test")
+   warn_for_status(r)
+   stop_for_status(r)
+   x = content(r)
+ })
Rows: 350 Columns: 40
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (40): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
   user  system elapsed
  0.478   0.059   0.751
> r$headers
$date
[1] "Mon, 04 Oct 2021 07:47:47 GMT"

$`content-type`
[1] "text/tab-separated-values; charset=UTF-8"

$`content-encoding`
[1] "gzip"

$`transfer-encoding`
[1] "chunked"

attr(,"class")
[1] "insensitive" "list"
> r
Response [http://127.0.0.1:4000/test]
  Date: 2021-10-04 07:47
  Status: 200
  Content-Type: text/tab-separated-values; charset=UTF-8
  Size: 255 kB
X1      X2      X3      X4      X5      X6      X7      X8      X9      X10     X11     X12     X13     X14     X15     X16     X17     X18     X19     X20     X21     X2...
14.005301101133227      23.625352163799107      92.48203919269145       53.32668418996036       43....
49.71931471955031       93.39884805958718       11.632965900935233      85.32237017061561       91.9...
78.89712990727276       39.409890957176685      90.90405236929655       0.5622981814667583      15....
41.816325089894235      47.52457377035171       32.00235064141452       40.431985072791576      4.1...
0.009128847159445286    32.41147438529879       77.71695936098695       59.59550221450627       74...
36.60198471043259       15.195061289705336      25.27566181961447       42.5753008807078        85.85...
77.14250846765935       76.39204727020115       82.18081402592361       79.50914725661278       5.536...
80.04448262508959       65.86111697833985       12.915009562857449      22.876024013385177      10....
27.796625229530036      86.96586838923395       98.06580552831292       81.84361984021962       26.9...
...
hadley commented 10 months ago

Sounds like the problem arises from curl, not from httr. I'd suggest filing an issue for the curl package, preferably including a reproducible example so there's some chance of being able to fix it.